Visualization Of Hyperlinks Essay

1688 words - 7 pages

ContentsCONTENTS 2PROLOGUE 3WEB-SPIDERS TASK DESCRIPTION 4PREPARATION 5JAVA 5WHAT COULD GO WRONG? 7CONCLUDE 9USED MATERIALS: 10PrologueHyperlinks allow for the navigation in the internet. Web pages can be linked with other pages, one important question is with which other pages a page is linked. The resulting link structure can be visualized using a directed graph. The nodes of the graph denote the individual web page whereas arrows connecting the nodes represent hyperlinks referring from one page to another.In order to develop such a application it's needed to go through all Html-code which given link consists and look for tags with argument and cut each of them out. This step is most important, next steps are just for manipulating with those url-s. As we need not just one page url-s but all url-s which are connected to given page we have to go them through once again. And finally merge together gathered links and application which makes graph from them. This is very rough explanation what should be done order to make web-spider.Next chapters are devoted for step by step instructions how to make web-spider and what kind of problems may appear doing it.Web-Spiders task descriptionDevelop a web spider, that is, an application that takes the URL of a web page as input and extracts all hyperlinks of this page. For each web page that can be reached by these hyperlinks, again all hyperlinks are to be extracted. The result should be visualized as a directed graph. For the visualization you can use the Graphviz tool which takes a graph description in the dot format as an input and draws the graph as a SVG image. The following picture shows the HTML code of the site and the resulting graph:PreparationFor a good start I first read few articles about Java Swing HTML parser, and then searched for a code examples from internet and studied them. Second thing was studing Graphviz tool, I read documentations order to understand what is graphviz tool all about and what is the syntax of Dot language. After all I found out that Graphviz is quite simple application to make graphs and not just simple graphs, what I was looking for, but tens of different sophisticated graphs. Unfortunately I cannot say the same thing about Java Swing Text HTML Class. I read many articles described this class, but it was still too confusing to me, but code examples what I found put much more light on it.On that point I discovered, that my preparation was over and was time to start programming. It took just few days to make this preparation and now, after finishing this project, I will always recommend a little preparation, it will help a lot.JavaOrder to make developing little bit easier and not very confusable, I divided my java application to 6 stages.* Making GUI* Getting Links* Making links suitable for my project* Handle links, method which coordinates when to use other methods* Adding links to Dot formatted file* Showing links as graphLet's extract those stages...

