The Crawl Web
block is designed to systematically explore websites and extract (nested) links for various purposes, such as analysis, SEO enhancements, or competitive research.
I/O | Feature | Type | Simple Explanation |
---|---|---|---|
input | url | string | The URL of the website you wish to crawl. |
input | depth | number | Determines how many layers deep the crawler will go; max value is 3. |
input | domain_whitelist (optional) | list | Restricts crawling to specified domains only. |
output | url_list | string[] | A collection of all URLs found during the crawl process. |
Setting a depth of 3 for the web crawler might take longer processing times as it will crawl nested links with exponential growth in complexity. Use it wisely.
The domain whitelist accepts domains such as “keyflow.space”, “www.keyflow.space”, or “docs.keyflow.space”. You can also specify full URLs like “https://keyflow.space”, and only the domain name will be processed by the crawler automatically.
Consider how this block can be beneficial in various scenarios:
In summary, whenever you need to reveal or analyze hidden connections within websites, the Crawl Web
block proves invaluable!