Overview:

The Crawl Web block is designed to systematically explore websites and extract (nested) links for various purposes, such as analysis, SEO enhancements, or competitive research.


Inputs & Outputs

I/OFeatureTypeSimple Explanation
inputurlstringThe URL of the website you wish to crawl.
inputdepthnumberDetermines how many layers deep the crawler will go; max value is 3.
inputdomain_whitelist (optional)listRestricts crawling to specified domains only.
outputurl_liststring[]A collection of all URLs found during the crawl process.

Setting a depth of 3 for the web crawler might take longer processing times as it will crawl nested links with exponential growth in complexity. Use it wisely.

The domain whitelist accepts domains such as “keyflow.space”, “www.keyflow.space”, or “docs.keyflow.space”. You can also specify full URLs like “https://keyflow.space”, and only the domain name will be processed by the crawler automatically.


Use Cases

Consider how this block can be beneficial in various scenarios:

  • Link Collection: Ideal for gathering all hyperlinks on a targeted website, which can assist marketers or researchers looking for resources.
  • Website Structure Analysis: Use this tool to visualize and understand how different pages interconnect within a site, aiding developers in optimizations.
  • SEO and Web Audit: Conduct thorough checks on internal and external links found across a website to improve SEO strategies or identify broken links.
  • Competitive Insight: Map out the link structure of competitor sites or analyze your own sites for strategic content placement based on observed patterns.

In summary, whenever you need to reveal or analyze hidden connections within websites, the Crawl Web block proves invaluable!