The Internet Mapping Project
This project is meant to publish the findings of possibly compromised servers throughout the world. As with everything else on the internet, what you choose to do with this information is solely up to you.
A little bit about combing the internet.
Scraping the internet - *ahem* politely - is messy, foreboding work. Fortunately, there are some excellent pieces of the puzzle already coded for us in the forms of the modules mechanize and urllib2.
first grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn
second grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn
.
.
.
nth grab -> big data (internet) -> aggregate -> verify -> publish -> analyse -> learn
As 'n' increases, so does the software's ability to complete each step. It's a very simple principle - software should learn which sites contain the best or newest data. Prioritize these sites and the process becomes faster.
Anyway, if this made any sense at all that's great.