AnsweredAssumed Answered

[WAS] Forbid the crawler to generate random urls

Question asked by Łukasz Szczyrba on Aug 18, 2015
Latest reply on Jan 8, 2017 by Dave Ferguson

Hi

 

Is there any possibility to forbid the crawler to generale random urls? For example: the crawler tries to access www.mydomain.com/Z1KAFgLJFBaG.html or /oM7KAU1b1QHh.html.

 

As an extension I would like to limit the scan to access only urls discovered on the page (or with specific name).

I've tried to set exclusion lists to scan only actions with 'abc' text:

black list: (url) http://www.mydomain.com

white list: (regex) http:\/\/www\.mydomain\.com.*abc.* (or simply abc)

But the crawler always scan the whole page (mentioned in Links Rejected, Links Rejected By Crawl Scope or Exclusion List and Links Crawled).

 

Thanks for help.

Outcomes