The robots.txt file is one of the basics of technical SEO that you should always take care of. It helps you control how search engines crawl your site—so that everything important appears in the search results and everything you don’t want to be shown there is blocked.
Why is controlling your page crawling and indexing so important?
- Saving the crawling budget. There’s a limit to how many pages a search bot can process in a given time. To make sure that your most important pages are crawled, and recrawled regularly, you should exclude those pages that don’t need to be shown in the search results.
- Preventing technical pages from being shown in the search. There are a lot of pages that your store generates for users’ convenience: pages with login, checkout, internal search, etc. They are crucial for UX but not needed to be ranked in search.
- Avoiding duplicate content issues. Speaking of the technical pages we’ve mentioned, they can create duplication: for instance, different sorting options will come at different URLs but show the same products, just in a different order. You don’t want those pages to get involved in rankings, as search engines don’t appreciate duplicate content.
How can you control your store’s page indexing?
To give value to your important pages and facilitate their indexing by search bots, you should always have an updated and correct sitemap. Plus, take care of internal linking and getting external sources to link out to your content so that your pages look more authoritative in the eyes of search engines.
These measures pretty much guarantee that your pages will be ranked in search, although there’s no sure-proof way to ensure 100% indexing.
What you can guarantee 100% is excluding certain pages you don’t want to appear in search. For this, you can use the noindex directive in the robots.txt file or the robots meta tag. Sounds very technical at first glance, but it’s actually very easy. Especially for Shopify merchants, as the platform automatically takes care of the most part of proper indexing.
So, what should you noindex on a Shopify store?
For online stores, it makes sense to block from indexing the following types of pages:
- Everything associated with user accounts. Those pages are unique to each customer and not needed in search.
- Everything associated with guest checkout. Even if users don’t log in to their account and are allowed to purchase as guests, pages with checkout steps generated for them are not meant for search.
- Faceted navigation and internal search. As we’ve already mentioned, offering those URLs to search bots will only confuse them, drain your crawling budget, and create duplicate content problems.
- Products you want to hide from search. If you don’t want certain products to be shown in search results—say, out-of-stock items or time-sensitive items that are no longer relevant—you can hide products from search in your Shopify robots.txt file.
Robots.txt in Shopify
To check your robots.txt file that is generated for you automatically, you can add the /robots.txt to your store’s domain:
What does this file usually contain? It indicates a particular search bot (the User-agent field) and gives crawling directives (Disallow means blocking from being accessed). In the example above, the first set of rules is given to all search bots (the User-agent is set to *). In turn, the Disallow directive forbids crawling of the specified pages. In the example, we can see that the file forbids crawling technical pages, such as admin, cart, checkout, and so on.
Until recently, Shopify didn’t give any flexibility with this file. But in June 2021, Shopify merchants were given the possibility to edit robots.txt. Predefined rules are mostly enough but might not consider all cases. If you use an app for internal search, it often changes the URL, and default rules aren’t applied. Or, if you have faceted navigation, the URL changes according to each chosen filter, and default rules might not take everything into account. You can add more pages and rules to your file, specify more user agents, etc.
To learn about existing directives you can apply, check out Google’s guide on robots.txt.
Also, note that new rules appear all the time. For example, at the beginning of 2022, Google introduced a new tag that controls the indexing of embedded content: indexifembedded. It can be applied if you have some widgets on your store that are inserted through iframe or similar HTML tag and you don’t want them to be indexed.
How to edit your robots.txt on Shopify?
In your theme’s code, you’ll see a bunch of templates (go to Online Store > Themes > click on Actions on your current theme > choose Edit code > go to Templates). The list should contain the robots.txt.liquid file.
If, for some reason, you don’t have the file, you can create it by clicking on the Add new template and choosing robots.txt.
For instance, let’s block internal search from indexing—it will look like this in the template:
Refer to Shopify’s help page on editing robots.txt for more details.
❗ Note that even if a page is disallowed in robots.txt, it can still get indexed if it has links from external sources. So, for instance, if you have an old page that received a decent amount of traffic in the past but is no longer relevant for your store, it’s better to block it with the robots meta tag or remove it completely.
Noindexing Shopify content with the robots meta tag
Besides robots.txt, the noindex directive can be inserted in the <head> section of your theme’s code with the help of the robots meta tag. The tag has the following syntax: <meta name=”robots” content=”noindex”>.
Similarly to how you edit or create a Shopify robots.txt code, go to theme.liquid in the Layout section. For example, this is what it will look if you add a rule for noindexing your /new-collection page:
This way, you’ll hide a page from search for good.
❗ Note that you can use noindex with the nofollow or follow directives together. With follow, your page will be blocked from indexing but will allow search bots to crawl other links placed on that page, while with nofollow, both the page itself and all links on it won’t be accessible to search bots.
Noindexing Shopify content with the help of apps
If all of this sounds like too much of a trouble to you, there are ways to control your page indexing even easier, without having to write a single line of code. There are several SEO apps for Shopify that will help you hide products from search in your Shopify store or block any other pages.
Take a look at these two:
- Sitemap Noindex SEO Tools ($3.49 per month for all types of pages)
- NoIndexify - Sitemap Manager (free for product, collection, and blog pages; $2.99 per month for other pages: search, pagination, login, etc.)
This is what NoIndexify’s interface looks like—for each page, you can choose a set of directives:
Improve your SEO by improving page indexing
That’s it: we hope you have a better understanding of how Shopify’s robots.txt works and how to use it to your advantage. With the help of robots.txt and robots meta tag, you can improve your control over page indexing, prevent SEO issues, and give more value to your most important pages so that they shine in search and attract more visitors.
If you’re looking for more Shopify SEO tips, check out our SEO guide.