A Social Feed at Instagram Scale

A social feed looks obvious. Show me recent posts from people I follow, newest first. Then you learn the numbers: hundreds of millions of users, some following thousands of accounts, some followed by hundreds of millions, and a feed that has to load in under a second every single time. The naive query dies instantly, and the real design is one of the most instructive in all of system design.

What we're building

Functional · what it does

Show recent posts from accounts I follow
New posts appear quickly
Infinite scroll into the past
Ranking, not just strict reverse-chronological
Likes and comment counts on each post

Non-functional · what it must survive

Hundreds of millions of users
Feed loads in well under a second
Handle accounts with 100M+ followers
Read-heavy: far more scrolling than posting
Stay up and fresh during traffic spikes

The one number that drives the whole design: this is overwhelmingly a read workload. People scroll constantly and post rarely. That imbalance tells you to spend effort making reads cheap, even if it makes writes more expensive.

The naive design, and why it dies

The obvious query: "give me posts from everyone I follow, sorted by time, limit 20."

SELECT * FROM posts
WHERE author_id IN (/* the 800 people I follow */)
ORDER BY created_at DESC
LIMIT 20;

For one user, fine. At scale, this runs on every feed open for hundreds of millions of users, each joining across hundreds or thousands of followed accounts, against a posts table with billions of rows. The database melts. You can't pay this cost on every read. So the real question becomes: when do you do the work, at write time or at read time?

The two strategies

Fanout-on-write (push)

When you post, immediately copy the post into the timeline of every follower. Reads are then trivial: a user's timeline is already built, so opening the app is one cheap lookup. The cost is at write time: one post by someone with a million followers means a million writes.

Fanout-on-read (pull)

Store posts once. Build each user's feed at read time by pulling recent posts from everyone they follow and merging. Writes are trivial (one row). The cost is at read time: every feed open does the expensive merge across everyone you follow.

Each is a disaster at the extreme the other handles well. Push is great for reads but explodes on celebrity posts. Pull is great for writes but makes every read expensive, especially for users who follow thousands of accounts. Run the push version and watch the write count blow up on a big account.

Fanout-on-write, and where it breaksrun · edit · saved to you

Loading editor…

The hybrid model everyone actually uses

The real systems combine both, and the rule is delightfully simple: push for normal accounts, pull for celebrities.

Normal user posts
Fanout-on-write. Copy the post into each follower's timeline. The follower count is small enough (hundreds, thousands) that this is cheap and reads stay instant.
Celebrity posts
Don't fan out. Writing to 100M timelines per post is absurd, and most of those followers aren't online to see it anyway. Just store the post once.
Any user reads their feed
Read their pre-built timeline (the pushed posts from normal accounts), then merge in recent posts from the handful of celebrities they follow, pulled at read time. Sort and return.

This is the key insight: a user follows at most a few big accounts, so the read-time merge is small and bounded. Meanwhile the bulk of their feed (normal accounts) was pre-computed. You get cheap reads without the celebrity write explosion. The threshold for "celebrity" (say, follower count above some number) is a tuning knob, not a fixed rule.

DecisionHybrid fanout: push for the many, pull for the few.

Pure push can't survive a 100M-follower post. Pure pull can't survive hundreds of millions of expensive feed reads. The hybrid bounds both: write amplification is capped because celebrities don't fan out, and read cost is bounded because you only pull from the small number of big accounts a user follows. The cost is a more complex code path with two regimes, which is well worth it.

Where the timeline lives

A user's pre-built timeline is a hot, frequently-read list. It belongs in a fast store, not the main database.

The common choice is Redis, with each user's timeline as a capped list or sorted set of post IDs (not the full posts, just IDs). When you build the feed, you read the list of IDs, then fetch the actual post content from a cache or the posts store, hydrating likes and comment counts as you go. Storing only IDs keeps the timeline small and lets you update a post's like count in one place instead of in every copy.

Store IDs in the timeline, not whole posts

If you copied full post content into every follower's timeline, a single edited caption or changed like-count would need updating in millions of places. Storing only post IDs means the timeline is a lightweight list of pointers; the mutable data (likes, comments) lives once and is fetched at read time. Cap each timeline at a few hundred entries, since nobody scrolls back thousands of posts.

The posts themselves live in a sharded, append-heavy store keyed by post ID. Older posts are rarely read, so they can move to cheaper storage over time while recent posts stay hot in cache.

Ranking, not just time

Modern feeds aren't strictly newest-first; they're ranked by predicted engagement. That changes the read path: after assembling the candidate set of posts (from push + pull), a ranking step scores each one and reorders.

This has to stay fast, so it's usually two stages. A cheap candidate generation step gathers a few hundred posts (the merged timeline). Then a heavier ranking model scores just those candidates. You never rank all of a user's possible posts, only the small candidate set, which keeps the expensive model bounded. The principle echoes the celebrity merge: do expensive work on a small, bounded set, never on everything.

Consistency and freshness

A feed is a place where eventual consistency is completely fine, and leaning into that buys you scale.

A new post appearing a few seconds late is invisible to users.
Like and comment counts can be slightly stale or approximate.
Timelines can be rebuilt lazily if a cache is lost, by falling back to the pull path.

You design the UX around this: when a user posts, optimistically show their own post immediately even though the fanout is still propagating in the background. Their friends will see it a moment later. Nobody notices the lag because nobody is comparing two phones side by side.

The cold-start and the inactive-user trap

Two edge cases bite teams. A brand-new user follows nobody, so push gives them an empty feed; you seed it with popular or recommended content (pure pull from a curated set). And fanning out to inactive users wastes enormous write effort on timelines nobody will ever read; many systems skip fanout to long-dormant accounts and rebuild their timeline lazily if they return. Don't pay to build feeds for people who aren't there.

The one idea to take away

The feed comes down to one decision: do the work on write or on read. Push makes reads cheap but explodes on big accounts; pull makes writes cheap but reads expensive. The answer is to split by account size, pre-compute the common case, and do bounded work at read time for the few exceptions. Then store timelines as lightweight lists of IDs in a fast cache, and treat freshness as eventual.

Test yourself

Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.

QWhy can't you just query posts from everyone a user follows on every feed load?+

Because at scale that query runs on hundreds of millions of feed opens, each joining across hundreds or thousands of followed accounts against a posts table with billions of rows. The cost per read is far too high. The fix is to do the work at write time (fanout) so reads become a cheap lookup of a pre-built timeline.

QFanout-on-write vs fanout-on-read, in one line each?+

Fanout-on-write copies a post into every follower's timeline when it's posted: fast reads, expensive writes, breaks on celebrities. Fanout-on-read builds the feed by pulling from followed accounts at read time: cheap writes, expensive reads, breaks for users who follow many accounts.

QHow does the hybrid model handle a 100M-follower celebrity?+

It doesn't fan out celebrity posts at all; it stores them once. At read time, a user's feed is their pre-built timeline (pushed posts from normal accounts) merged with recently pulled posts from the few big accounts they follow. Since each user follows only a handful of celebrities, the read-time merge stays small and bounded.

QWhy store post IDs in the timeline instead of full posts?+

So mutable data isn't duplicated. If you copied whole posts, an edited caption or a changed like count would need updating in millions of timelines. Storing only IDs keeps timelines lightweight and lets the post's mutable fields live in one place, fetched and hydrated at read time.

QWhere should timelines be stored, and why?+

In a fast in-memory store like Redis, as capped lists or sorted sets of post IDs per user. Timelines are read constantly and must return in milliseconds, which the main relational database can't do at this volume. The authoritative posts live in a sharded store; the timeline is a hot, cached index into them.

QHow does ranking stay fast if feeds aren't strictly chronological?+

Two stages. Cheap candidate generation assembles a few hundred posts (the merged timeline), then a heavier ranking model scores only those candidates and reorders them. You never rank a user's entire universe of possible posts, only the small candidate set, which bounds the expensive work.

QIs it okay for a feed to be eventually consistent?+

Yes, and you should lean on it. A post appearing a few seconds late, or slightly stale like counts, is invisible to users. Optimistically show a user their own post immediately while fanout propagates in the background. Eventual consistency is what lets the feed scale; strong consistency here would buy nothing and cost a lot.