Website Crawler is a SaaS (Software as a Service) that you can use to crawl and analyze up to 25 pages for free in real-time. It is robust and fast, and can generate JSON or CSV format file from the extracted data. You can run the crawler as many times as you want, up to the daily set limit.
Website Crawler displays the analysis in a pie chart, making it easy to find areas that need optimization. It generates a pie chart for loading times, internal/external links, HTTP status codes, content, etc. If you spot an issue, re-run a crawl and see the latest results. Our charts are updated in real-time.
With WebsiteCrawler, you can extract data from websites with just a click of a button. Once a crawl job is complete/over, your data is available for download instantly. You can download the data in a CSV, Markdown, or JSON format file. We also offer an API which supports data retrieval in JSON format in case you want structured data for your project/software. You can configure the "custom tag settings" to scrape information of your choice from web pages.
The visualization of data is just the first step in making your website better. We provide detailed reports of analysis so that you can fix the issues and improve its search presence. We let users filter data with many conditions. From finding internal redirects to pages with duplicate content, we provide many reports that will help you improve your site.
Configure the crawl behavior by selecting the crawler type, change the number of URLs to process per minute and introduce a delay. Track credit usage, and see crawl history. Add your branding to the PDF reports, and activate modes like "Markdown". Configure email alerts, blacklist/whitelist directives, and more all from one place.
link_off Broken Links: Find unreachable, internal and external links on your site. Also identify redirect chains. This SaaS checks HTTP status code of each URL it has analyzed and makes you aware of unreachable URLs.
bolt Load time: See the loading time for every analyzed page. Filter data and find pages that are slow and fast in no time. We also support the PageSpeed Insights metrics. Enter the PageSpeed API key, and start analyzing FCP, TBT, LCP, etc scores.
file_copy Duplicate titles, meta tags: Multiple title, meta description tags can confuse search bots especially those who are indexing your pages for ranking in the search engines. With Website Crawler, you can easily find the pages that have multiple title or meta tags.
broken_image Missing alt Tags: Search bots index images it finds on the internet and displays them in their image search tools. If the image URL does not have an alt tag, it may not rank for search keywords. This SaaS has an "images report" which you can use to find pages having pictures with or without alt tag.
account_tree XML Sitemap: Generate an XML sitemap for your site with a click of a button. Exclude URLs from the sitemap or add priority or specify "changefrequency" for the URLs. If you're using a CMS or a custom-built site that does not have a sitemap, use this feature.
file_export Export data: Export/download the data displayed in the reports section to a PDF, CSV, or a spreadsheet file. There's also an option to export the entire website data to a file. Website Crawler can also generate LLM ready structured data format i.e. JSON file from the scraped data.
javascript JavaScript crawling: Crawl JavaScript heavy single page applications and sites. WebsiteCrawler executes JS to capture dynamically generated/loaded content.
domain_verification SSL certificate monitoring: Check SSL certificate expiry date in the dashboard for timely renewals and prevent SSL related warnings and downtime. Get email alerts just before the SSL certificate expires.
schedule Schedule crawls: Forget manual crawl runs. Automate data extraction and analysis by scheduling a crawl. This platform will automatically start analyzing your website at a time of your choice and will send an email once the job is over.
security Analyze security headers: Audit security header (and other response headers) across the site. Filter by specific response header to find issues in your security configuration.
security Compare crawls: Identify content changes over time. Select two different timestamps and see what title, description, paragraph, etc has changed between the time 1 and time 2.
link Canonical Link issues: One of the major reasons why pages might not rank despite having good content is improper canonical links. Website Crawler identifies invalid canonical links and displays it.
format_h1 Audit heading tags: Find pages without h1 to h5 tags or having duplicate h5 to h1 tags. Filter heading by text/word and analyze heading structure of each analyzed page.
network_node The number of internal/external links: See the number of internal and external links on pages along with their robots.txt status. Filter the list by the URL count with just one click of a button.
abc Thin content: Ranking of websites can tank after an algorithm update if it has several pages with thin content. Finding thin content on a site is a breeze with this SaaS.
acuteFast: WebsiteCrawler.org is fast. It can crawl 1000s of pages within a few minutes. It can execute the scraping/crawling tasks in the background while you work on other things.
format_h1Custom data: Configure the crawler to extract specific data from the page using CSS style selectors. See the scraped data report or export the data to a PDF, CSV, JSON format file.
articleLog files: You can see useful data from the access log files with our log file analyzer [beta].
spellcheckBulk check spelling mistakes: WebsiteCrawler can bulk check 100s of articles for spelling mistakes with one click of a button. After identifying the mistakes, it will make you aware of the pages with spelling errors.
content_copyFind duplicate content: Find similar or duplicate pages across the site and see their similarity scores. Configure this check by the page type i.e. indexable or nonindexable.
counter_8See readability scores: See how readable text content on your site is. The Flesch-Kincaid scores for paragraphs on each crawled page is calculated and displayed along with the number of words.
search_check_2 GSC Integration: Connect WebsiteCrawler with your Google Search Console account and find the top performing keywords using our powerful filters. You can also track the performance of the keywords of your choice.
image Page/image size: Identify large images and pages which could be making your site slow. This platform records the size of each page it has processed and image it encountered while analyzing your site.
track_changes Track issues: Issues are stored by date. See how many issues you've fixed and monitor improvements over time. Export the issue list to a PDF file to share with your technical team.
This SaaS has been designed and built for:
WebsiteCrawler is a SaaS (Software as a Service) that crawls every link it has found on an entered domain. It does not overwhelm any server but does the job like a pro.
Enter a non redirecting and reachable website domain (include https, www, http, etc) and the number of URLs you want this SaaS to analyze and click the submit button. Once the crawler gets into action, you will see the list of URLs that have been analyzed. This list is updated every 2 seconds (for paid users) or 10 to 15 seconds (for free users). Once the number of links in the list is equal to the limit you've entered, you will see a form with option to log in with your Google account or register a new account. Proceed with the option of your choice to see the dashboard.
WebsiteCrawler can render JS heavy sites. It thus supports every publicly reachable website. It does not automatically fill and submits form. It works only with publicly available information on the HTML pages.
We have set a daily limit of 25 URLs for free plan users. For paid users, this limit is increased to 1000+. How does this feature work? WebsiteCrawler keeps a record of the total links of a domain it has crawled. Once the daily threshold is reached and you enter the domain and limit in the above form, and click on the "crawl my site now" button, you will see an error.
Although this SaaS supports JS, some pages of the site may be poorly linked. Here's when this feature comes in handy. To make WebsiteCrawler crawl a sitemap, you should enter the url of the sitemap in the "xml sitemap" text box available above. Websitecrawler.org will extract each URL from the sitemap file and analyze the number of pages you want us to analyze.
Yes, in the settings page of WebsiteCrawler.org, there's a "custom tags" section where you have to select a project, enter a URL and the tags you want this software to scrape (you must enter CSS tag e.g. div > p). You should fill out this form and click the submit button. If the tag is valid and you see some matched results below the form, it will be added to the list of tags that will be processed.
WebsiteCrawler lets users download data of a website in a comma separated value (CSV), Markdown, or JSON file. The generated JSON file includes a JSON Array containing one or several JSON objects. The time taken to download the file depends on the data length and your internet connection speed.
Yes, this platform provides an API through which you can get data in LLM ready format instantly once the website data is in its database. You have to create an API key to use this feature. A few lines of code can integrate WebsiteCrawler with any LLM of your choice provided it supports JSON data.
The crawl progress should appear within 15 to 20 seconds you have clicked the button. In case this does not happen, use the sitemap crawl function i.e. enter the sitemap URL instead of the non redirecting domain and try again.