Broken Links Checker: Find 404 links on website

website crawler http status 404

When a page of a website is unreachable, the visitor will see an error message in the browser’s tab. These messages are reported because of the following reasons:

The database server isn’t working: All nonstatic websites save data to the database. If the database server isn’t working, the page won’t won’t be able to get data from the DB table and either the page will be blank or the webserver will report NON-HTTP 200 status code.

Rate Limiting: A web server may be configured to limit the number of continuous requests a client/visitor can make to the page. If several requests are made to a page in a short period of time, the web server will throw an error.

The page has been removed: If the webmaster, user or a developer has removed the page, the webserver will report HTTP Status 404. The problem with the 404 status code is that the search bots will make several attempts to crawl the page in the future. To reduce the number of these attempts, you can configure the web-server to throw HTTP Status 410 error instead of 404 for the broken links.

Other reasons that may make your web server respond with status codes other than HTTP 200 are as follows:

  • DNS issue.
  • Network issue.
  • User is blocked by the firewall, etc

Using Website Crawler as a broken links checker

website crawler http status 404

Website Crawler not only enables you to find broken links on your site but also makes you aware of unresponsive pages on your website. Follow the below steps to find broken URLs on your website:

Step 1: Enter the URL of your website in the textbox 1 and the number of URLs you want to check in the textbox 2 displayed on the homepage and click the submit button.

Step 2: Click the Status button to see the “Crawl Status”. Once Website Crawler finishes crawling your site, enter your email address and then the verification code sent to your inbox. Now, log in to your account.

Step 3: Once you log in, you’ll see your project name, website URL and the last crawl date. Click the project name to see the reports.

Step 5: Click the “HTTP Status” URL on the left sidebar of the reports interface and click the 1st drop-down list. Now choose the HTTP status code from the list of options displayed on the screen. Once you do so, click the drop-down list 2 and choose one of the following two options:

  • Internal Links.
  • External Links.

Click the “Filter” button. Website Crawler will now display the list of URLs that responded with the HTTP status code you had selected in the drop-down list 1. To see the page where the interlink was detected by Website Crawler, click the “Source” button and scroll down till you find the “Pages where the link ______ was found” section.

internal link sources

Conclusion: You can not only find broken URLs on your website but also discover pages that are responding with Non-200 HTTP status code with Website Crawler.

By pramod

Pramod is the developer/founder of Website Crawler. He loves building web applications.

Leave a comment

Your email address will not be published.