New Web Site Crawler Scenarios!
July 2, 2018
Many customers have requested that RedWolf add web site crawler capability to the platform. Crawlers are used to enumerate pages on web sites and scrape information. Crawlers are designed to exhibit realistic usage patterns to avoid detection by WAF’s and DDoS devices.
RedWolf is pleased to announce the new capability of automated Web Site Crawling scenarios to all RedWolf users with dedicated platform licenses.
Three types of web site crawlers have been added:
- Real Browser (Firefox) – with ability to save screenshots and HAR files. This processes all cookies and JavaScript
- Headless Scraper – a popular scraping framework that simulates a real browser, runs JavaScript but does not implement 100% of browser features authentically
- Recursive Bot – a much simpler profile which recurses a page pattern to a specific depth. Can be set to ignore certain types of content, or only pursue HTML links. Does not run JavaScript but does set cookies
In the world of web crawlers it became necessary to seem ‘human’
The ability to create delays and pauses:
- Arbitrary fixed delay between pages – establishes a minimum time between ‘clicking a link’
- Random delay between pages – makes traffic more ‘user like’ by introducing random delays from 1 second to 60 seconds
- Periodic long pauses – allows pauses of up to 48 hours
Compared to regular desktops RedWolf’s agents are very powerful and extremely well connected to the Internet it became necessary to be able to simulate slow DSL connections so that the agents would not give themselves away with their high transfer speeds.
The ability to simulate slow Internet connections:
- Ability to set the upload and download bitrate per instance
- This allows asymmetric DSL connections to be simulated
These new capabilities can be used for a variety of use cases including:
- Simulating search engines
- Scraping content of entire web sites
- Taking screen shots of every page of a web site
- New DDoS attack types that are very hard for most WAF’s to detect
- Testing advanced abuse detection / web log correlation
To learn more about web crawlers please visit this link.