This website uses cookies to ensure you have the best experience. Learn more

The Crawling Module And Web Pages

998 words - 4 pages

With the first one, a collection can have various copies of web pages grouped according to the crawl in which they were found. For the second one, only the most recent copy of web pages is to be saved. For this, one has to maintain records of when the web page changed and how frequently it was changed. This technique is more efficient than the previous one but it requires an indexing module to be run with the crawling module. The authors conclude that an incremental crawler can bring brand new copies of web pages more quickly and maintain the storage area fresher than a periodic crawler.
III. CRAWLING TERMINOLOGY
The web crawler keeps a list of unvisited URLs which is called as frontier. The list is initiate with start URLs which may be given by a user or some different program. Each crawling loop engages selecting the next URL to crawl from the frontier, getting the web page equivalent to the URL, parsing the retrieved web page to take out the URLs and application specific information, and lastly add the unvisited URLs to the frontier. Crawling process may be finished when a specific number of web pages have been crawled. The WWW is observed as a huge graph with web pages as its nodes and links as its edges. A crawler initiates at a few of the nodes and then follows the edges to arrive at other nodes. The process of fetching a web page and take out the links within it is similar to expanding a node in graph search. A topical crawler tries to follow edges that are supposed to lead to portions of the graph that are related to a matter.
Frontier:
The crawling method initialize with a seed URL, extracting links from it and adding them to an unvisited list of URLs, This list of unvisited URLs is known as Frontier. The frontier is basically a agenda of a web crawler that includes the URLs of web pages which is not visited. The frontier may be applied as a FIFO queue in which case breadth-first crawler that can be used to blindly search the Web. The URL which is to be crawl next comes from the top of the queue and the new URLs are added to the bottom of the queue.
Fetching:
To obtain a Web page, client sends a HTTP request for a particular web page and reads the reply of web pages. There must have timeouts of particular we page or web server to make sure that an unnecessary amount of time is not spent on web servers which is slow or in reading large web pages.
Parsing:
When a web page is obtained, then content of web pages is parsed to extract information that will provide and possibly direct the prospect path of the web crawler. Parsing involves the URL extraction from HTML pages or it may involve the more difficult process of meshing up the HTML content.
IV. PROPOSED WORK
The functioning of Web crawler [10]...

Find Another Essay On The Crawling Module and Web Pages

The Future of Web Designers and Developers

1029 words - 5 pages ." Responsive and Alternate Web Design for Mobile Browsers. N.p., n.d. Web. 06 Mar. 2014. "Adobe Muse: Designing for Smartphones and Tablets." AdobeTV. N.p., 06 Mar. 2013. Web. 06 Mar. 2014. Bolkan, Joshua. "THE Journal." Report: Mobile Devices To Surpass PCs in Web Access by 2015 --. Campus Technology and THE Journal, 29 Oct. 2012. Web. 06 Mar. 2014. "Modify Pages for Tablets and Smartphones." Business Catalyst Help. N.p., n.d. Web. 06 Mar. 2014

The World Wide Web and Plagiarism

1574 words - 6 pages The World Wide Web and Plagiarism In the recent past when computers were available to the public, users could easily type a document without having to retype a whole page to correct or add a part to a document. Shortly after that came the Internet where countless pages of documents and information became accessible to nearly everybody. The problem with plagiarism was much smaller and easier to detect before the Internet

Web 2.0 and the Future of Journalists

1387 words - 6 pages Web 2.0 or "the world wide web’ is the network as platform, spanning all connected devices" (O’reilly 2009). The social software applications Web 2.0 offers such as; social networking sites, blogs and podcasts etc. has made communicating easier and for some, more accessible, especially with the improvement of portable, hand held devices like phones and tablets. Although the evolution of the World Wide Web was proposed by Tim Berners-Lee in 1989

Data Mining and the Social Web

1717 words - 7 pages ads more effective. This is a problematic practice because users are unaware that in most social media sites such as Facebook, this tool is used (Jessica Reyman “User Data on the Social Web: Authorship, Agency, and Appropriation”). Hidden deep into most terms of service is the right to sell and mine your information to third parties, because most people are unaware that this is the status quo of how social media and other sites make their money

The History of the Internet and the World Wide Web

1010 words - 4 pages The History of the Internet and the WWW 1. The History of the World Wide Web-      The internet started out as an information resource for the government so that they could talk to each other. They called it "The Industrucable Network" because it was so many computers linked to gether that if one server went down, no-one would know. This report will mainly focus on the history of the World Wide Web (WWW

The Future of the Web and e-Business

2540 words - 10 pages IntroductionAs we consider the amazing capabilities the web and e-Business have brought to the global marketplace one can't help but wonder what lies ahead. If a review of the historical growth and expansion of electronic commerce over the past 30 years is any indication than the future is even more promising. Just as the growth occurred in periods of time in the past we discuss the future of the web and e-Business in the same manner. Consider

Information Technology, the Web Publishing Focus Track and Mohammed Alsubaie’

1095 words - 5 pages , downloading from the Internet, posting blogs on the Internet, and updating web pages (Greenspun 5). It can be likened to the traditional print publishing, but as a contrast, Web Publishing utilizes the Internet as a medium. Therefore as a start, an understanding of the Internet and the World Wide Web is essential. Web Publishing as both a career and a task requires one to have skills in, and knowledge of the web technologies (or tools

AJAX and the traditional Client Server web model

1147 words - 5 pages the content that needs to be updated, thus drastically reducing bandwidth usage and load time.The use of asynchronous requests allows the client's Web browser UI to be more interactive and to respond quickly to inputs, and sections of pages can also be reloaded individually. Users may perceive the application to be faster or more responsive, even if the application has not changed on the server side.The use of AJAX can reduce connections to the

Amazon and eBay : The New Face of Web Services

1650 words - 7 pages 1. What are the purpose and business value of Web services? The aim is to connect the system, since Web users and business partners. In addition, the aim is to sell products on the platform and the site acts as a guide for their customers. In the other hand, creating a web service data will exchange between businesses in real time via the Internet. Therefore, this method may share business with suppliers, customers and other business partners

The World Wide Web Assists Pornographers and Sexual tourism

2584 words - 10 pages sexual tourism operator soon discover that there are many who share the same desire that they do. Several groups have formed together to form organizations to promote their desire to engage in sexual relations with children. The have written and produced manuals, built internet sites, manage web pages, formed on-line chat/discussion rooms and have even lobbied legislative bodies.The more vocal groups have pressed for the elimination laws

Amazon and eBay : The New Face of Web Services

708 words - 3 pages 1) What are the purpose and business value of Web services? Amazon is platform charge more in web technology services from its website, they realize that if they have all bottled-up intellectual as developers in their services, they will be more valuable to the Amazon, so they create "programmable web" where programmer can have access to data and functionality website. Furthermore, while EBay took an invitation-only developer program to make

Similar Essays

The Gentle Art Of Web Pages

1031 words - 4 pages The Gentle Art of Web Pages For the last millennium, adventurous souls have been accessing new and unfamiliar frontiers in search of adventure and a taste of the exotic. The last decade ushered in with it an appeal to the more intrepid members of this small group of people: The Internet. Access to this particular medium has hit an all-time high in the 1990's, and every tekkie has his own celebration of self occupying space on it. However

The Web And Education Essay

2458 words - 10 pages The Web and Education I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on this planet.(1) —John Guare The Chronicle of Higher Education recently reported that Internet researchers at the University of Notre Dame (Barabási, Albert, and Jeong, 1999) discovered that, on average, due to the hypertext links, any two Web pages are only 19

The Difference Between Web 1.0 And Web 2.0

2264 words - 10 pages Question 1: What is the difference between Web 1.0 and Web 2.0? Hubpages, The Difference between Web 2.0 and Web 1.0, retrieved on 20th April 2014 Web 1.0 Web 2.0 Mostly read only Widely read-write Company focus Community focus Client server Peer to peer Home pages Blogs/wikis Owning content Sharing content Web forms Web applications IPOs Trade sales

The Benefits And Pitfalls Of The Web

974 words - 4 pages mainly related to changing the format of images to keep file size to a minimum." They also pointed out, "those students who preferred a paper-based submission process cited the following comments: ‘I don’t trust computers’ and ‘Prefer paper—can organize and style it better’." (Bridge & Appleyard, 2005). According to Timm and Duven (2010), Barnes (2010) defines social networking as: social networking tools are a group of Web sites that provide