One lingering issue since deploying the NBA Now feed has been disambiguating terms whose meaning is largely determined by context. For example, consider the "Spurs", which could be referring to the San Antonio Spur in the NBA or the Tottenham Hotspurs in the EPL. The term by itself is ambiguous, but the context of the post can help us infer the meaning. The context, however, is not always available. The classifier does a good job of filtering out non-NBA posts when there is sufficient context, but cases where the context is not explicitly available in the post text are more difficult to filter out.
Consider the post "Spurs are playing great today!", is the user talking about the San Antonio Spurs or the Tottenham Hotspurs? Hard to say with just the post text. You could use some other information. If there is an image in the post you could check the content to see if it's basketball or soccer. Knowing the user's location could help, but that's not always available either.
Then there's a consideration that there are a bunch of other examples where common disambigations occur, "Luka" could be the player for the Lakers or the anime character. If you see a post, "Curry is cooking tonight!" is the user talking about the NBA player or are they excited about the dinner they're making? Using heuristics to infer the meaning of the post is not feasible to cover all of the cases.
To solve the problem I added a User Propensity Score to the processing pipeline. When a post contains a disambiguation term the user is checked for their propensity to post about the NBA or something else. If the user has a history of posting about the NBA the post is included, otherwise it's excluded. The figure below shows the full processing pipeline.
To calculate the score I created a table in the database to track the usage of NBA-related keywords for each user. If a user's history of posts only included a narrow set of these keywords it's safe to assume that the user is talking about something not NBA-related (i.e. they're likely a fan of the Tottenham Hotspurs). Finding a variety of keywords in the user's history is a good indicator that the user is a fan of the NBA and should be included in the feed.
After implementing the User Propensity Score the number of non-NBA posts in the feed has dropped significantly. The best test was in the first Tottenham game after adding this check. During the game the amount of "Spurs" posts referencing the EPL team was reduced by 93%. Dozens of Tottenham Hotspurs posts were excluded from the feed by the User Propensity Score keeping the feed focused on NBA content. The solution was effective, but there's still room for improvement. A handful of posts creeping into the feed were by users who had posted about the NBA a couple of times as well. The usage of the score needs to be refined to be more accurate.
Wrapping Up
The User Propensity Score has proven to be a valuable addition to the NBA Now feed, helping to maintain its focus on basketball content while filtering out unrelated posts. While there's still room for improvement, the algorithm is doing its job well - delivering NBA content to basketball fans on Bluesky.
Want to see it in action? Check out the NBA Now feed on Bluesky to follow all the NBA action. Whether you're tracking your favorite team's progress or following league-wide developments, the feed provides a curated stream of basketball content from the Bluesky community. And if you're already using the feed, I'd love to hear your feedback on how it's working for you!