mattpointblank’s avatarmattpointblank’s Twitter Archive—№ 16,450

      1. Here is a story in three charts: Two weeks ago, we launched a new search tool at work with 200,000+ results pages. A few days ago, the CPU usage for the database powering the search started to look like this: (eg. spiky, dangerous levels of high usage)
        oh my god twitter doesn’t include alt text from images in their API
    1. …in reply to @mattpointblank
      We spent a few days tuning performance, adding indexes, tweaking queries etc. No difference. We looked at the logs – lots of Googlebot traffic. We checked our Google Search Console. Googlebot was very, very hungry. And we'd just given it almost half a million new pages.
      oh my god twitter doesn’t include alt text from images in their API
  1. …in reply to @mattpointblank
    We realised that each of the pages it was crawling made a call to a "related" endpoint which was slow/expensive to render. We quickly disabled this. Here's the almost immediate impact on CPU:
    oh my god twitter doesn’t include alt text from images in their API
    1. …in reply to @mattpointblank
      Moral of the story: assume all robots are ceaseless, greed-powered machines intent on grinding all of your servers into dust, and code appropriately.
      1. …in reply to @mattpointblank
        The search tool in question: biglotteryfund.org.uk/funding/grants