Overview:
TheCrawl Web block is designed to systematically explore websites and extract (nested) links for various purposes, such as analysis, SEO enhancements, or competitive research.
Inputs & Outputs
| I/O | Feature | Type | Simple Explanation |
|---|---|---|---|
| input | url | string | The URL of the website you wish to crawl. |
| input | depth | number | Determines how many layers deep the crawler will go; max value is 3. |
| input | domain_whitelist (optional) | list | Restricts crawling to specified domains only. |
| output | url_list | string[] | A collection of all URLs found during the crawl process. |
The domain whitelist accepts domains such as “keyflow.space”,
“www.keyflow.space”, or “docs.keyflow.space”. You can also specify full URLs
like “https://keyflow.space”, and only the domain name will be processed by
the crawler automatically.
Use Cases
Consider how this block can be beneficial in various scenarios:- Link Collection: Ideal for gathering all hyperlinks on a targeted website, which can assist marketers or researchers looking for resources.
- Website Structure Analysis: Use this tool to visualize and understand how different pages interconnect within a site, aiding developers in optimizations.
- SEO and Web Audit: Conduct thorough checks on internal and external links found across a website to improve SEO strategies or identify broken links.
- Competitive Insight: Map out the link structure of competitor sites or analyze your own sites for strategic content placement based on observed patterns.
Crawl Web block proves invaluable!