• How buildap web crawler application will look like?

Document Actions
Replies: 0   Views: 3174
Up one level
You need to be a registered member to post to this forum. Register now.
Prev topic | Next topic

 • How buildap web crawler application will look like?

Posted by havoyan at 2004-11-16 12:22
Buildap goal is to build applications visually without programming. As such any application should be build from reusable generic parts. If we use parts, which is application specific, that should be a device which should be breakdown into generic parts. In case of web crawler the possible design may contain HttpClient, LinkExtractor, Stack, and Writer parts. LinkExtractor can further be composed from either Xpath component or RegularExpression part (depends how we want to extract links – using Xpath search or regular expressions). So the logic will be to put first url into Stack part. Then HTTPClient gets first url in the stack and gets the page from Internet, which should be passed to Writer. In parallel the page also forwarded to LinkExtractor, and all link urls will be put into the Stack. As soon as Writer writes down a page, it fires a success event, which also connected to Stack element so that the cycle repeats. We can also add a counter or a timer parts so we finish after certain page or at given time. 
Manager
Posts: 0
This topic contains no replies