Building a Scalable Infrastructure to Host My Bluesky Feeds

Earlier this year I built a custom Bluesky feed for tracking all of the NBA posts happening on Bluesky. At the time I enjoyed the novelty of being able to craft my own social media experience. Customizing the feed meant I was able to curate the content I wanted to see, which meant no slop or bots, ONLY posts about the NBA, and showing the posts chronologically. Turns out other people enjoyed this as well! There seemed to be an audience for curated feeds on Bluesky, so I wanted to build more of these communities.

Building a Data Service

A lot of what was done to support NBA Now was done in the spirit of learning and to get something working quickly. The feed ran on a single server that was responsible for reading the firehose, processing posts and serving the feed. During high-traffic times the server would fall behind causing users to see a delay in the feed. The processing rules were written to a JSON file. Anytime I changed some of the processing rules new code would need to be pushed, which definitely wasn't scalable.

First, I standardized the processing logic for the feeds. A database schema was created to store the feed processing rules and feed membership. Moving the processing logic into a database removed the messiness of managing processing logic inside the codebase while also standardizing processing logic across all feeds. With the database I could push an update to the logic and the service is configured to automatically pick up the new rules.

Since the processing rules are relatively static a Redis service syncs with the database and caches the rules for quick access. Using Redis to cache the rules eliminates millions of calls to the database each day! The Redis service also caches additional information like recent user activity and post interactions that are frequently accessed when processing and serving the feed.

Processing logic was also standardized across all feeds. All of the rules hardcoded into the NBA Now feed were abstracted into a common library that any feed can leverage. When building a feed the same standard set of tools are available, which greatly simplifies the maintenance of the feeds. Instead of tweaking the code for each feed, I can update the processing rules of the feed in the database. If new capabilities are needed for a feed they can be easily added to the common library.

To manage the rules for each feed a simple React UI was developed. The UI provides some community management capabilities like creating and deleting feed logic, as well as blocking users from a feed. Right now the UI is capable, but not refined. More development to come here!

Microservice Architecture

Expanding to support other feeds meant I needed to rethink the entire infrastructure to support the additional feeds. Ultimately I decided on a microservice architecture so I could scale different services independently as needed. There are three main services:

The Core Service should be relatively simple. It's only responsibility is to read the Bluesky firehose and publish events to the appropriate RabbitMQ queue that the processing service consumes. The core service should always be up-to-date with the firehose and not fall behind that can cause an error.

The Processing Service is where the heavy lifting happens. Each feed has its own set of rules for determining if a post should be included in the feed. These rules include text matching, classification models, and user propensity scoring. As more feeds are added, post volume increases, or the rules become more complex, the processing service will need to scale.

Finally, the Feed Service serves up the feed to users when requested, which is the list of posts that should be displayed to the user. The service should scale according to the number of users requesting feeds across the entire service.

RabbitMQ passes messages between the services. The core service publishes events to a RabbitMQ queue that the processing service consumes. Once the event is published the Core service can continue to read the firehose without waiting for the processing service to finish.

Implementing this new microservice architecture has resulted in a more stable and scalable system. The architecture has been running for a month supporting the new feeds. Millions of posts are processed each day and even during high-traffic events like the NBA Finals there was no noticeable delay in the feed served to users. Each day the various feeds are served up over 10,000 times and growing!

Next Steps

Before getting to the fun flashy new pieces, let's acknowledge there is still some foundational work to do. A couple hardcoded rules still exist in the codebase that need to be moved and managed in the database. Some more post processing logic needs to be implemented to handle quote post and image contexts.

Now the fun stuff! Building a new feed takes a decent amount of time and effort to get dialed in. For the next round of feeds my goal is to automate away some of the manual effort. I want to provide some high-level guidance for the feed and have the feed built automatically with an AI agent. Over time the feed learns to refine itself and make suggestions for improvements. I'm not entirely sure what this looks like, but I'm excited to explore!

Finally, I'm looking at ways to make the feeds more interactive. Over the past few weeks I've been experimenting with pinning posts to highlight content creators in the community. I want to do more of this, but again need to figure out how to automate this since I can't reasonably do this for each feed manually. There are down times throughout the day where the feed is slow. I'm looking at ways to inject some content into the feed to keep it more lively during those times.

Lots of fun stuff to explore! You can follow me on Bluesky for all the latest updates!