About This Project

The Internet Mapping Project

This project is meant to publish the findings of possibly compromised servers throughout the world. As with everything else on the internet, what you choose to do with this information is solely up to you. 

A little bit about combing the internet.

Scraping the internet - *ahem* politely - is messy, foreboding work. Fortunately, there are some excellent pieces of the puzzle already coded for us in the forms of the modules mechanize and urllib2.

This is what the software does at a very high level in order to find information to publish to this blog:

first grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn

second grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn
.
.
.
nth grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn

As 'n' increases, so does the software's ability to complete each step. It's a very simple principle - software should learn which sites contain the best or newest data. Prioritize these sites and the process becomes faster. 

Anyway, if this made any sense at all that's great.