Recently, I wrote an article about my journey in learning about robots.txt
and its implications on the data rights in regards to what I write in my blog. I was confident that I wanted to ban all the crawlers from my website. Turned out there was an unintended consequence that I did not account for.
Ever since I changed my robots.txt
file, I started seeing that my LinkedIn posts no longer had the preview of the article available. I was not sure what the issue was initially, since before then it used to work just fine. In addition to that, I have noticed that LinkedIn’s algorithm has started serving my posts to fewer and fewer connections. I was a bit confused by the issue, thinking that it might have been a temporary problem. But over the next two weeks the missing post previews did not appear.
This is what my LinkedIn posts used to look like: no link preview and little engagement with the post
After doing a quick search, I found a tool called LinkedIn Post Inspector. This tool can show everything that you would want to know about links that you are about to share on the platform. I plugged in my recent article in the tool and it revealed to me the reason why I could no longer see the previews - the robots.txt
file had a directive that would not allow LinkedIn bot to scrape the web pages. This was the error message that I got:
Fair enough! Thinking of it now - it makes so much sense! I should have seen that ahead of time. Whenever you want to post something on LinkedIn or other social media platform, they would need to request the page the link of which you are about to share. These bots need that page in order to get access to the meta tags necessary to create the preview. Those are known as Open Graph meta tags. And it all comes from Open Graph Protocol, which was originally created at Facebook.
According to the OPG website,
The Open Graph protocol enables any web page to become a rich object in a social graph. For instance, this is used on Facebook to allow any web page to have the same functionality as any other object on Facebook.
There are only a few required tags needed to implement this for your web resource. But this protocol is highly extensible and allows for all sorts of media to be used as meta information about your page.
To turn your web pages into graph objects, you need to add basic metadata to your page. We’ve based the initial version of the protocol on RDFa which means that you’ll place additional
<meta>
tags in the<head>
of your web page. The four required properties for every page are:
og:title
- The title of your object as it should appear within the graph, e.g., “The Rock”.og:type
- The type of your object, e.g., “video.movie”. Depending on the type you specify, other properties may also be required.og:image
- An image URL which should represent your object within the graph.og:url
- The canonical URL of your object that will be used as its permanent ID in the graph, e.g., “https://www.imdb.com/title/tt0117500/".
This simple yet powerful protocol is what makes the posts across all of the Internet more presentable and informative.
To turn this issue around I updated my robots.txt to allow LinkedInBot
to crawl my resources. If I would want to start posting on other social media sites and see the previews for the posts, I would need to include those other bots here as well. My new current configuration looks like this:
User-agent: LinkedInBot
Allow: /
User-agent: *
Disallow: /
Turns out that sometimes drastic measures like blocking all crawlers could make content presentation suffer. What I missed when I implemented that change was the fact that I didn’t thoroughly test the impact of blocking all of the traffic like that. Now that I have encountered that issue, it led me to learn more about the Open Graph Protocol and tools like LinkedIn Post Inspector which told me what the problem was.
When working on any feature - no matter how small it is - make sure that you understand the domain in which you are creating the changes. As the practice shows, sometimes that is not how things work out. Initially I did not connect the dots between the OPG and robots.txt prohibitions.
Sometimes you have to break a few eggs before you make an omelet. Well… you always have to break some eggs to make an omelet… You get the point.