Crawl Web
Overview:
The Crawl Web
block is designed to systematically explore websites and extract (nested) links for various purposes, such as analysis, SEO enhancements, or competitive research.
Inputs & Outputs
I/O | Feature | Type | Simple Explanation |
---|---|---|---|
input | url | string | The URL of the website you wish to crawl. |
input | depth | number | Determines how many layers deep the crawler will go; max value is 3. |
input | domain_whitelist (optional) | list | Restricts crawling to specified domains only. |
output | url_list | string[] | A collection of all URLs found during the crawl process. |
Setting a depth of 3 for the web crawler might take longer processing times as it will crawl nested links with exponential growth in complexity. Use it wisely.
The domain whitelist accepts domains such as “keyflow.space”, “www.keyflow.space”, or “docs.keyflow.space”. You can also specify full URLs like “https://keyflow.space”, and only the domain name will be processed by the crawler automatically.
Use Cases
Consider how this block can be beneficial in various scenarios:
- Link Collection: Ideal for gathering all hyperlinks on a targeted website, which can assist marketers or researchers looking for resources.
- Website Structure Analysis: Use this tool to visualize and understand how different pages interconnect within a site, aiding developers in optimizations.
- SEO and Web Audit: Conduct thorough checks on internal and external links found across a website to improve SEO strategies or identify broken links.
- Competitive Insight: Map out the link structure of competitor sites or analyze your own sites for strategic content placement based on observed patterns.
In summary, whenever you need to reveal or analyze hidden connections within websites, the Crawl Web
block proves invaluable!
Was this page helpful?