Category: Tips

  • Broken Links Checker: Find 404 links on website

    Broken Links Checker: Find 404 links on website

    When a page of a website is unreachable, the visitor will see an error message in the browser’s tab. These messages are reported because of the following reasons:

    The database server isn’t working: All nonstatic websites save data to the database. If the database server isn’t working, the page won’t won’t be able to get data from the DB table and either the page will be blank or the webserver will report NON-HTTP 200 status code.

    Rate Limiting: A web server may be configured to limit the number of continuous requests a client/visitor can make to the page. If several requests are made to a page in a short period of time, the web server will throw an error.

    The page has been removed: If the webmaster, user or a developer has removed the page, the webserver will report HTTP Status 404. The problem with the 404 status code is that the search bots will make several attempts to crawl the page in the future. To reduce the number of these attempts, you can configure the web-server to throw HTTP Status 410 error instead of 404 for the broken links.

    Other reasons that may make your web server respond with status codes other than HTTP 200 are as follows:

    • DNS issue.
    • Network issue.
    • User is blocked by the firewall, etc

    Using Website Crawler as a broken links checker

    website crawler http status 404

    Website Crawler not only enables you to find broken links on your site but also makes you aware of unresponsive pages on your website. Follow the below steps to find broken URLs on your website:

    Step 1: Enter the URL of your website in the textbox 1 and the number of URLs you want to check in the textbox 2 displayed on the homepage and click the submit button.

    Step 2: Click the Status button to see the “Crawl Status”. Once Website Crawler finishes crawling your site, enter your email address and then the verification code sent to your inbox. Now, log in to your account.

    Step 3: Once you log in, you’ll see your project name, website URL and the last crawl date. Click the project name to see the reports.

    Step 5: Click the “HTTP Status” URL on the left sidebar of the reports interface and click the 1st drop-down list. Now choose the HTTP status code from the list of options displayed on the screen. Once you do so, click the drop-down list 2 and choose one of the following two options:

    • Internal Links.
    • External Links.

    Click the “Filter” button. Website Crawler will now display the list of URLs that responded with the HTTP status code you had selected in the drop-down list 1. To see the page where the interlink was detected by Website Crawler, click the “Source” button and scroll down till you find the “Pages where the link ______ was found” section.

    internal link sources

    Conclusion: You can not only find broken URLs on your website but also discover pages that are responding with Non-200 HTTP status code with Website Crawler.

  • Google de indexing your site? Learn how to find the root cause of this issue and fix it

    Google de indexing your site? Learn how to find the root cause of this issue and fix it

    Google dropping 1000s of pages from its index is a nightmare for bloggers, webmasters, developers, and online business owners. One of my sites has around 28000 pages. Google had indexed around 12000 pages of this site but in the last few months, it started dropping the pages from its index.

    If you’re following SEO news closely, you might know that Google De index bug has been a talk of the town of late. This bug has affected several large websites. I ignored the issue of “deindexing pages on my website” thinking that the Google De index bug may be responsible for it. This was a dreaded mistake.

    Google kept on dropping pages of my site from its index. A few weeks after spotting the issue, I re-checked the coverage report of Google Search Console hoping that Google may have fixed the De Index bug. I was shocked to find that the total indexed pages were now 5670 (From 12000, the count of indexed pages dropped to 5670).

    Google Search Console Coverage Report

    Sitemap

    sitemap

    Did Google De Index bug affect my site?

    No, it was a technical issue.

    How I found and fixed the De Indexing issue?

    I ran Website Crawler on my affected site. Then, I logged into my account. The first report I checked was the “Meta Robots” tag report. I was skeptical that the pages are being deindexed because one of my website’s function was injecting meta robots noindex tag in the website’s header but I was wrong. This report was clean. Then, I opened the “HTTP Status report to see whether all the pages on the site were working or not. The HTTP status for each page on the site had the status “200”. The next report I checked was the “Canonical Links” report. When I opened the report, I was shocked to find that several thousand pages of the affected website had an invalid canonical tag.

    A few days after fixing the issue Google started indexing the dexindexed pages

    after fixing the issue

    Tip: If Website Crawler’s Canonical Links report interface displays false instead of true in the 3rd column, there’s a canonical link issue on the page that is displayed in the same row. See the below screenshot:

    How does the report look like?

    canonical links checker

    The issue on my site

    The valid syntax for canonical links is as follows:

    <link rel="canonical" href="" />

    I mistakenly used “value” instead of “href” i.e. the canonical tag on my site looked like this:

    <link rel="canonical" value="" />

    The “Value” didn’t make sense and it confused Googlebot, Bingbot and other search bots. I fixed this issue and re-submitted the sitemap. Google started re-including the dropped pages once again (see the 3rd screenshot from the top).

    Conclusion: If Google is de-indexing 100s or 1000s of pages of your website, you should check the canonical and robots meta tags of the pages of your website. The issue may not be at Google’s side but a technical error like the one I’ve mentioned above may be responsible for this.