Class CrawlOptions.Builder

  • Enclosing class:
    CrawlOptions

    public static final class CrawlOptions.Builder
    extends java.lang.Object
    • Method Detail

      • prompt

        public CrawlOptions.Builder prompt​(java.lang.String prompt)
        Natural language prompt to guide crawling.
      • excludePaths

        public CrawlOptions.Builder excludePaths​(java.util.List<java.lang.String> excludePaths)
        URL path patterns to exclude from crawling.
      • includePaths

        public CrawlOptions.Builder includePaths​(java.util.List<java.lang.String> includePaths)
        URL path patterns to include in crawling.
      • maxDiscoveryDepth

        public CrawlOptions.Builder maxDiscoveryDepth​(java.lang.Integer maxDiscoveryDepth)
        Maximum depth to discover links.
      • sitemap

        public CrawlOptions.Builder sitemap​(java.lang.String sitemap)
        Sitemap handling: "skip", "include", or "only".
      • ignoreQueryParameters

        public CrawlOptions.Builder ignoreQueryParameters​(java.lang.Boolean ignoreQueryParameters)
        Ignore query parameters when deduplicating URLs.
      • deduplicateSimilarURLs

        public CrawlOptions.Builder deduplicateSimilarURLs​(java.lang.Boolean deduplicateSimilarURLs)
        Deduplicate URLs that are similar.
      • limit

        public CrawlOptions.Builder limit​(java.lang.Integer limit)
        Maximum number of pages to crawl.
      • crawlEntireDomain

        public CrawlOptions.Builder crawlEntireDomain​(java.lang.Boolean crawlEntireDomain)
        Whether to crawl the entire domain.
      • allowExternalLinks

        public CrawlOptions.Builder allowExternalLinks​(java.lang.Boolean allowExternalLinks)
        Follow external links.
      • allowSubdomains

        public CrawlOptions.Builder allowSubdomains​(java.lang.Boolean allowSubdomains)
        Follow subdomains.
      • ignoreRobotsTxt

        public CrawlOptions.Builder ignoreRobotsTxt​(java.lang.Boolean ignoreRobotsTxt)
        Ignore the website's robots.txt rules. Enterprise only.
      • robotsUserAgent

        public CrawlOptions.Builder robotsUserAgent​(java.lang.String robotsUserAgent)
        Custom User-Agent string for robots.txt evaluation. Enterprise only.
      • delay

        public CrawlOptions.Builder delay​(java.lang.Integer delay)
        Delay in milliseconds between requests.
      • maxConcurrency

        public CrawlOptions.Builder maxConcurrency​(java.lang.Integer maxConcurrency)
        Maximum concurrent requests.
      • regexOnFullURL

        public CrawlOptions.Builder regexOnFullURL​(java.lang.Boolean regexOnFullURL)
        Apply regex patterns to the full URL, not just the path.
      • zeroDataRetention

        public CrawlOptions.Builder zeroDataRetention​(java.lang.Boolean zeroDataRetention)
        Do not store any scraped data on Firecrawl servers.
      • integration

        public CrawlOptions.Builder integration​(java.lang.String integration)
        Integration identifier.