Webarchiv content

About archive

Facts

Harvesting began on 3/9/2001.

Comprehensive Harvests


The main focus of comprehensive crawls is to automatically harvest the biggest number of Czech web resources. The requirements of comprehensive crawls are:

  • Domain – Czech domain (.cz) web resources are collected. Resources with other domains can be also harvested, but they have to meet the optional requirements:
Other requirements are optional:
  • Format – harvesting different formats of resources depends on a technical settings of the harvester
  • Access – only freely accessible resources are harvested
  • Number of files – maximum 5000 files from one domain


Statistics of comprehensive harvests of domestic web resources (domain .cz)

Harvest: Start: Total records in collection: Size of collection (MB):
CZ 2001 September 2001 3 017 058 106 520
CZ 2002 April 2002 10 272 093 315 756
CZ 2004 March 2004 32 161 396 1 058 305
CZ 2005 June 2005 9 336 123 253 785
CZ 2006 August 2006 70 741 016 3 465 016
CZ 2007 November 2007 81 300 000 3 600 000
CZ 2008 November 2008 78 203 483 3 900 000
CZ 2009 November 2009 178 342 230 6 600 654
CZ 2010 November 2010 373 178 080 9 720 367
CZ 2011 November 2011 345 232 271 10 914 568
CZ 2014 March 2014 151 329 351 9 812 113
TOTAL: 1 332 113 301 49 747 084
WebArchiv
Contact: webarchiv@nkp.cz
Last update: 24/2/2017