Who does your intranet link to?
Have you ever wondered who does your intranet link to? I was sitting the other day in a meeting (yes, I know, breaking news) and I was wondering what would be a fast way to be able to answer the question. The basic sketch I did in my mind was simple: Set up a web crawler to the domain I want to analyze Run the crawling job Get the links collected on the web map Process the links to only keep the site they refer to Remove duplicates Visualize the graph Simple isn’t it? So, what do I need to get it to work? Basically three pieces of software (a web crawler, a graph manipulation library, and a visualization package) and some glue. Going over the things I been playing for the last year I draw three candidates: Nutch, RDFlib, and prefuse. Oh, the glue will be just two Python scripts. ...