Microslicing Operation and Performance

Document created by Corey Bodzin on Dec 14, 2011Last modified by Corey Bodzin on Dec 14, 2011
Version 3Show Document
  • View in full screen mode

Microslicing improves the efficiency of QualysGuard scanning by breaking a large set of scan targets into small chunks (Microslices) and distributing them round-robin across multiple scanners. This has the effect of more evenly balancing scan load across multiple scanners, and on average reduces total scan time.

 

Example: For this discussion we'll assume that ACME is a company that wants to scan 172.31.0.0/16 (class B) using 4 scanner appliances that have their polling interval set to 180 seconds.  Their network has a lumpy distribution of targets:  172.31.0.1-172.31.63.254 is 100% populated, 172.31.193.1-172.31.254.254 has no hosts, and the other ranges have 50% population in alternating blocks.  We'll also assume that a live IP can be scanned in 60 seconds and that a non-live IP takes only 10 seconds to complete. 

 

Also, please note that all times and target distributions are purley theoretical for this example.  There are many factors that impact scan performance - target OS and applications, target vulnerabilities, network conditions, etc. - so actual performance cannot be inferred from this example.

 

First, let's define what Microslicing is.  Prior to the use of Microslicing jobs were assigned as follows:

 

  1. Get the total number of scanners ("S")
  2. Get all the IP space targeted for a scan ("T")
  3. Allocate scanning where each scanner gets T/S space.
  4. Scan!

 

For example, if we targeted 172.31.0.0/16 (T=65,000) for a scan job with 4 scanners (S=4) then each scanner would be assigned approximately 16,000 IPs to scan.

 

Scanner 1:  172.31.0.1-172.31.63.254 is 100% live and takes 11 days (16000 minutes) to scan

Scanner 2:  172.31.64.1-172.31.127.254 is 50% live and takes 6.5 days (9333 minutes) to scan

Scanner 3:  172.31.128.1-172.31.191.254 is 50% live and takes 6.5 days (9333 minutes) to scan

Scanner 4:  172.31.192.1-172.31.254.254 is 100% empty and takes 1.8 days (2666 minutes) to scan

 

The job wouldn't be done until poor Scanner 1 was finished with its work in 11 days.  Scanner 4 would be sitting idle for 9 days.

 

Microslicing improves this situation in the following way:

 

  1. Get the total number of scanners ("S")
  2. Get all the IP space targeted for a scan ("T")
  3. Determine how many 4,000-target microslices there are ("M" = "T/4000" + 1 [leftovers "T modulo 4000"])
  4. Hand out a slice to each scanner until all scanners have a slice
  5. Scan slices!
  6. When a scanner completes its slice, it will ask as for another slice (when it polls; assume polling_interval/2) until all slices are gone.

 

 

Using the same example as above we come up with a very different calculation.

 

T=65000

S=4

M = 17 (16 4,000 target slices, 1 slice of 1000 targets)

 

At the start of the job (0 days) we would hand out the following work:

 

Scanner 1:  172.31.0.1-172.31.15.254 (really less, but I'm rounding for simplicity) is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #1

Scanner 2:  172.31.16.1-172.31.31.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #2

Scanner 3:  172.31.32.1-172.31.47.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #3

Scanner 4:  172.31.48.1-172.31.63.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #4

 

At the end of 2.7 days they all ask for more jobs.

 

Scanner 1:  172.31.64.1-172.31.79.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #5

Scanner 2:  172.31.80.1-172.31.95.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #6

Scanner 3:  172.31.96.1-172.31.111.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #7

Scanner 4:  172.31.112.1-172.31.127.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #8

 

At the end of 3.2 days both Scanner #2 and Scanner #4 are done and ready to ask for more slices, while Scanners 3 and 4 still have 2.2 days more work to do on their slices. 

 

Scanner 2:  172.31.128.1-172.31.143.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #9

Scanner 4:  172.31.144.1-172.31.159.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #10

 

At the end of 3.7 days scanner #2 is now chugging away on slice #9 while scanner #1 is still working slice #5 and scanner #3 is still working on slice #7.  Scanner #4 is done and asks for more work.

 

Scanner 4:  172.31.160.1-172.31.175.254 is 100% live and takes 2.7 days (4000 minutes) to scan SLICE #11

 

At the end of 5.4 days both scanner #1 and scanner #3 finish their work and ask for more:

 

Scanner 1:  172.31.176.1-172.31.191.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #12

Scanner 3:  172.31.192.1-172.31.207.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #13

 

At the end of 5.9 days scanner #2 finishes his work (slice #9) just as scanners #1 and #3 are done with their work; they all get more work:

 

Scanner 1:  172.31.208.1-172.31.223.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #14

Scanner 2:  172.31.224.1-172.31.239.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #15

Scanner 3:  172.31.240.1-172.31.254.254 is 100% empty and takes .5 days (666 minutes) to scan SLICE #16

 

Scanner #4 is finally done at 6.4 days at the same time as the other 3 finish; one of them would pick up the final slice (#17) and the job would finish.  I'm not reflecting that here because of the rounding errors I had above (slices were 4094 rather than 4000).  The scan completes in only 58% of the time required previously (6.4 days versus 11 days) by virtue of keeping all the scanners busy until all the work is done.

 

Again, please note that all times and target distributions are purely theoretical for this example.  There are many factors that impact scan performance - target OS and applications, target vulnerabilities, network conditions, etc. - so actual performance cannot be inferred from this example.

 

There are a few other points that are worth mentioning for the future of Microslicing:

 

  • We currently have a minimum slice size of 100 and a maximum size of 4,000; in the future we may adjust these settings..
    • Consequently, jobs that are smaller won't see a huge impact from Microslicing.  For example, if you're only scanning 12000 targets with 3 scanners then performance really won't be different than under the current model (each appliance would get 1 chunk of 4000 targets to scan either way).
  • Remember that the amount of traffic required for microsliced jobs isn't impacted, which means that the average bandwidth usage will go up.  In the above example we've reduced the scan time from 11 days to 6.4, but the same amount of traffic T is generated, so the average utilization will be nearly double (T/6.4 rather than T/11).
  • There is currently a limit of 100 slices for a job (the maximum slice size is increased to keep the number of slices at or below 100) that may be changed in the future.  This limit is designed to keep the overhead of job creation (remember, scanners have to poll for slices which can take several minutes) at a good balance with improved performance.
  • All this applies to host-based scanning (VM/PC/FDCC).  QualysGuard WAS is architected differently (URLs) and does not follow this behavior.

 

Qualys continues to refine Microslicing so that QualysGuard can remain the most scalable scanning solution available.

Attachments

    Outcomes