Heritrix 2.0.2
Heritrix is an open source flexible, robust, extensible, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.Heritrix (sometimes spelled heretrix, or misspelled or mis-said as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). What\'s New in This Release: [ read full changelog ]Bug:· List of classes is not present in select menu for DecideRules· WARC metadata records should declare MIME-type \'application/warc-fields\' (rather than \'text/anvl\')· bottleneck in StatisticsTracker.saveSourceStats?· META http-equiv refresh content containing only a number misinterpreted as a URIImprovement:· ${HOSTNAME} in arc suffix is only replaced completely· update to BDB-JE 3.3.74· Update \'public suffix list\' (effective_tld_names.dat)
Ссылка: http://kent.dl.../sourceforge/archive-crawler/heritrix-2.0.2-dist.zip
Ссылка: http://kent.dl.../sourceforge/archive-crawler/heritrix-2.0.2-dist.zip
Видео: