Krwkrw is a web crawler/scrapper...scrapper is more apt actually.
If using Maven as your build tool, you can add it to your project via:
<dependency> <groupid>com.blogspot.geekabyte.krwkrw</groupid> <artifactid>krwler</artifactid> <version>0.1.2</version> </dependency>
If using Gradle, then:
dependencies { compile "com.blogspot.geekabyte.krwkrw:krwler:0.1.2}" }
A quick run down of stuffs worth mentioning that comes with this release:
- Addition of 3 utility classes that makes it easy to store the crawled webpages to a relational database, ElasticSearch or saved into a CSV file.
- It is now possible to register a callback that would be fired when the crawling operation terminates. Should be most useful when the crawling operation is done with the Async mode.
- A bug where broken links are crawled multiple times.
- General improvements to API, tests, etc...
You can see the Readme for more information. And yeah, the 'a' was dropped from the name, from Krawkraw to Krwkrw, because, all consonants name sounds cooler.
For a background story on how Krwkrw came to be, please read A web scraper/crawler in Java: Krawkraw
No comments:
Post a Comment