GoSpider Tutorial – Fast web crawler

Welcome to this comprehensive GoSpider tutorial. GoSpider is a powerful open-source tool developed by the Jaeles Project, designed for crawling web applications to find resources like directories, files, and external URLs. In this guide, we will walk you through GoSpider’s features, installation steps, and usage. Whether you’re a developer, security analyst, or just starting with web crawling, this guide is tailored to give you a solid foundation.

GoSpider tutorial

 

Features of GoSpider

Before diving into the installation and usage, let’s look at some of the key features GoSpider offers:

  • Fast web crawling using Go’s concurrency capabilities.
  • Finding subdomains, and hidden directories and files.
  • Support for parsing and fetching URLs from HTML, JavaScript, and more.
  • Customizable with regular expressions to match certain patterns.
  • Incorporation of blacklisting for specific URLs or domains.
  • Granular control over headers, cookies, and proxies.

Installation Steps

To get started with GoSpider, follow these steps for installation:

go get -u github.com/jaeles-project/gospider

This command will download and install GoSpider into your Go workspace.

Usage of GoSpider

To use GoSpider, simply type gospider in your terminal followed by a series of flags and arguments to perform your desired action.

GoSpider Examples

Here we’ll showcase 20 examples of GoSpider’s capabilities:

1. Basic web crawling

gospider -s "https://example.com"

Output:

[+] Found: https://example.com/contact

 

2. With concurrent threads

gospider -s "https://example.com/" -o output -c 5 -d 2

This command specifies to save the output (-s), use 10 concurrent threads (-c 5), set the maximum depth to 2 (-d 2), and scan the specified URL (-u http://example.com).

3. Run with site list

gospider -S sites.txt -o output -c 10 -d 1
  • -S sites.txt: This option specifies the source of URLs for the scan. In this case, it indicates that the URLs are listed in the file named sites.txt. Each line in the file likely contains a separate URL.
  • -o output: This option specifies the output directory where the results of the scan will be stored. In this case, the output will be saved in a directory named output. The tool may generate various files and directories containing the scan results.
  • -c 10: This option sets the number of concurrent threads or workers that GoSpider will use during the scan. In this case, it’s set to 10, meaning that the tool can perform up to 10 concurrent requests or operations.
  • -d 1: This option sets the maximum depth of the crawl. It determines how many levels deep the tool will follow links from the initial URLs. Here, it’s set to 1, meaning that GoSpider will only crawl the initial URLs and not follow links to external pages.

Conclusive Summary

Through this GoSpider tutorial, you’ve learned about its robust features and how to install and use the tool. These examples serve as a starting point to explore the numerous possibilities that GoSpider offers. Remember, GoSpider can be an invaluable addition to your toolkit for web application reconnaissance, ensuring you’re one step ahead in the ever-evolving landscape of web security.

References