Skipfish是一个积极的Web应用程序的安全性侦察工具。 它准备了一个互动为目标的网站的站点地图进行一个递归爬网和基于字典的探头。 然后，将得到的地图是带注释的与许多活性（但希望非破坏性的）安全检查的输出。 最终报告工具生成的是，作为一个专业的网络应用程序安全评估的基础。
How to run the scanner?
To compile it, simply unpack the archive and try make. Chances are, you will need to install http://ftp.gnu.org/gnu/libidn/libidn-1.18.tar.gz‘>libidn or http://www.pcre.org/‘>libpcre3 first.
Next, you need to read the instructions provided in doc/dictionaries.txt to select the right dictionary file and configure it correctly. This step has a profound impact on the quality of scan results later on, so don’t skip it.
Once you have the dictionary selected, you can use -S to load that dictionary,
and -W to specify an initially empty file for any newly learned site-specific
keywords (which will come handy in future assessments):
$ touch new_dict.wl
$ ./skipfish -o output_dir -S existing_dictionary.wl -W new_dict.wl \
You can use -W- if you don’t want to store auto-learned keywords anywhere.
Note that you can provide more than one starting URL if so desired; all of them will be crawled. You can also read a list of URLs from a file using this syntax:
$ ./skipfish …other options… -o output_dir @/path/to/url_list.txt
The tool will display some helpful stats while the scan is in progress. You
can also switch to a list of in-flight HTTP requests by pressing return.
A simple companion script, sfscandiff, can be used to compute a delta for two scans executed against the same target with the same flags. The newer report will be non-destructively annotated by adding red background to all new or changed nodes; and blue background to all new or changed issues found.
Some sites may require authentication; for simple HTTP credentials, you can try:
$ ./skipfish -A user:pass …other parameters…
Alternatively, if the site relies on HTTP cookies instead, log in in your browser or using
a simple curl script, and then provide skipfish with a session cookie:
$ ./skipfish -C name=val …other parameters…
Other session cookies may be passed the same way, one per each -C option.
Certain URLs on the site may log out your session; you can combat this in two ways: by using the -N option, which causes the scanner to reject attempts to set or delete cookies; or with the -X parameter, which prevents matching URLs from being fetched:
$ ./skipfish -X /logout/logout.aspx …other parameters…
The -X option is also useful for speeding up your scans by excluding /icons/, /doc/,
/manuals/, and other standard, mundane locations along these lines. In general, you can
use -X and -I (only spider URLs matching a substring) to limit the scope of a scan any way you like - including restricting it only to a specific protocol and port:
$ ./skipfish -I http://example.com:1234/ …other parameters…
A related function, -K, allows you to specify parameter names not to fuzz
(useful for applications that put session IDs in the URL, to minimize noise).
Another useful scoping option is -D - allowing you to specify additional hosts or domains to consider in-scope for the test. By default, all hosts appearing in the command-line URLs are added to the list - but you can use -D to broaden these rules, for example:
$ ./skipfish -D test2.example.com -o output-dir http://test1.example.com/
…or, for a domain wildcard match, use:
$ ./skipfish -D .example.com -o output-dir http://test1.example.com/
In some cases, you do not want to actually crawl a third-party domain, but you trust the owner of that domain enough not to worry about cross-domain content inclusion from that location. To suppress warnings, you can use the -B option, for example:
$ ./skipfish -B .google-analytics.com -B .googleapis.com …other parameters…
By default, skipfish sends minimalistic HTTP headers to reduce the amount of data exchanged over the wire; some sites examine User-Agent strings or header ordering to reject unsupported clients, however. In such a case, you can use -b ie or -b ffox to mimic one of the two popular browsers; and -b phone to mimic iPhone.
When it comes to customizing your HTTP requests, you can also use the -H option to insert any additional, non-standard headers (including an arbitrary User-Agent value); or -F to define a custom mapping between a host and an IP (bypassing the resolver). The latter feature is particularly useful for not-yet-launched or legacy services.
Some sites may be too big to scan in a reasonable timeframe. If the site features well-defined tarpits - for example, 100,000 nearly identical user profiles as a part of a social network - these specific locations can be excluded with -X or -S. In other cases, you may need to resort to other settings: -d limits crawl depth to a specified number of subdirectories; -c limits the number of children per directory, -x limits the total number of descendants per crawl tree branch; and -r limits the total number of requests to send in a scan.
An interesting option is available for repeated assessments: -p. By specifying a percentage between 1 and 100%, it is possible to tell the crawler to follow fewer than 100% of all links, and try fewer than 100% of all dictionary entries. This - naturally - limits the completeness of a scan, but unlike most other settings, it does so in a balanced, non-deterministic manner. It is extremely useful when you are setting up time-bound, but periodic assessments of your infrastructure. Another related option is -q, which sets the initial random seed for the crawler to a specified value. This can be used to exactly reproduce a previous scan to compare results. Randomness is relied upon most heavily in the -p mode, but also for making a couple of other scan management decisions elsewhere.
In certain quick assessments, you might also have no interest in paying any particular attention to the desired functionality of the site - hoping to explore non-linked secrets only. In such a case, you may specify -P to inhibit all HTML parsing. This limits the coverage and takes away the ability for the scanner to learn new keywords by looking at the HTML, but speeds up the test dramatically. Another similarly crippling option that reduces the risk of persistent effects of a scan is -O, which inhibits all form parsing
and submission steps.
Some sites that handle sensitive user data care about SSL - and about getting it right. Skipfish may optionally assist you in figuring out problematic mixed content or password submission scenarios - use the -M option to enable this. The scanner will complain about situations such as http:// scripts being loaded on https:// pages - but will disregard non-risk scenarios such as images.
Likewise, certain pedantic sites may care about cases where caching is restricted on HTTP/1.1 level, but no explicit HTTP/1.0 caching directive is given on specifying -E in the command-line causes skipfish to log all such cases carefully.
In some occasions, you want to limit the requests per second to limit the load on the targets server (or possibly bypass DoS protection). The -l flag can be used to set this limit and the value given is the maximum amount of requests per second you want skipfish to perform.
Scans typically should not take weeks. In many cases, you probably want to limit the scan duration so that it fits within a certain time window. This can be done with the -k flag, which allows the amount of hours, minutes and seconds to be specified in a H:M:S format. Use of
this flag can affect the scan coverage if the scan timeout occurs before testing all pages.
Lastly, in some assessments that involve self-contained sites without extensive user content, the auditor may care about any external e-mails or HTTP links seen, even if they have no immediate security impact. Use the -U option to have these logged.
Dictionary management is a special topic, and - as mentioned - is covered in more detail in dictionaries/README-FIRST. Please read that file before proceeding. Some of the relevant options include -S and -W (covered earlier), -L to suppress auto-learning, -G to limit the keyword guess jar size, -R to drop old dictionary entries, and -Y to inhibit expensive keyword. k e y w o r d . extension fuzzing.
Skipfish also features a form auto-completion mechanism in order to maximize scan coverage. The values should be non-malicious, as they are not meant to implement security checks - but rather, to get past input validation logic. You can define additional rules, or override existing ones, with the -T option (-T form_field_name=field_value, e.g. -T login=test123 -T password=test321 - although note that -C and -A are a much better method of logging in).
There is also a handful of performance-related options. Use -g to set the maximum number of connections to maintain, globally, to all targets (it is sensible to keep this under 50 or so to avoid overwhelming the TCP/IP stack on your system or on the nearby NAT / firewall devices); and -m to set the per-IP limit (experiment a bit: 2-4 is usually good for localhost, 4-8 for local networks, 10-20 for external targets, 30+ for really lagged or non-keep-alive hosts). You can also use -w to set the I/O timeout (i.e., skipfish will wait only so long for an individual read or write), and -t to set the total request timeout, to account for really slow or really fast sites.
Lastly, -f controls the maximum number of consecutive HTTP errors you are willing to see before aborting the scan; and -s sets the maximum length of a response to fetch and parse (longer responses will be truncated).
When scanning large, multimedia-heavy sites, you may also want to specify -e. This prevents binary documents from being kept in memory for reporting purposes, freeing up a lot of RAM.
Further rate-limiting is available through third-party user mode tools such as http://monkey.org/~marius/trickle/‘>trickle, or kernel-level traffic shaping.
Oh, and real-time scan statistics can be suppressed with -u.
But seriously, how to run it?
A standard, authenticated scan of a well-designed and self-contained site (warns about all external links, e-mails, mixed content, and caching header issues), including gentle brute-force:
$ touch new_dict.wl
$ ./skipfish -MEU -S dictionaries/minimal.wl -W new_dict.wl \
Five-connection crawl, but no brute-force; pretending to be MSIE and caring less about ambiguous MIME or character set mismatches, and trusting example.com links:
$ ./skipfish -m 5 -L -W- -o output_dir -b ie -B example.com http://www.example.com/
Heavy brute force only (no HTML link extraction), limited to a single directory and timing out after 5 seconds:
$ touch new_dict.wl
$ ./skipfish -S dictionaries/complete.wl -W new_dict.wl -P -I http://www.example.com/dir1/ \
-o output_dir -t 5 -I http://www.example.com/dir1/
For a short list of all command-line options, try ./skipfish -h. A quick primer on some of the particularly useful options is also http://lcamtuf.blogspot.com/2010/11/understanding-and-using-skipfish.html‘>given here.