Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers. It's a Scrapy spider, meaning it's easily modified and extendable to your needs.
- Linux -- I haven't tested this on Windows.
- Python 2.7 or Python 3.3+
- Scrapy 1.4.0 or above.
Inventus requires Scrapy to be installed before it can be run. Firstly, clone the repo and enter it.
Now install the required dependencies using
$ git clone https://github.com/nmalcolm/Inventus
$ cd Inventus
Assuming the installation succeeded, Inventus should be ready to use.
$ pip install -r requirements.txt
The most basic usage of Inventus is as follows:
This tells Scrapy which spider to use ("inventus" in this case), and passes the domain to the spider. Any subdomains found will be sent to
$ cd Inventus
$ scrapy crawl inventus -a domain=facebook.com
The other custom parameter is
subdomain_limit. This sets a max limit of subdomains to discover before quitting. The default value is 10000, but isn't a hard limit.
$ scrapy crawl inventus -a domain=facebook.com -a subdomain_limit=100
Exporting data can be done in multiple ways. The easiest way is redirecting
STDOUTto a file.
Scrapy has a built-in feature which allows you to export items into various formats, including CSV, JSON, and XML. Currently only subdomains will be exported, however this may change in the future.
$ scrapy crawl inventus -a domain=facebook.com > facebook.txt
$ scrapy crawl inventus -a domain=facebook.com -t csv -o Facebook.csv
Configurations can be made to how Inventus behaves. By default Inventus will ignore robots.txt, has a 30 second timeout, caches crawl data for 24 hours, has a crawl depth of 5, and uses Scrapy's AutoThrottle extension. These and more can all be changed by editing the
inventus_spider/settings.pyfile. Scrapy's settings are well documented too.