WebArchiv contains 60,3 terabytes of data. Harvesting began on 3/9/2001.
The following websites were recently added to WebArchiv:
Mineralogist.cz : minerály České republiky
Městská knihovna a infocentrum Smržovka
Around the World in 2 bln pages
Institutions from over 60 countries joined Internet Archive's unique global crawl of 2 billion web pages run last year. WebArchiv contributed 707 seeds. See list of participating countries and institutions.
We have already finished comprehensive harvest of domestic web resources (domain .cz). The collection of the year 2007 contains 81 300 000 documents (3,6 TB).
Web Cultural Heritage
The latest issue of DCP/PADI What's new in digital preservation (no. 15) features in its Web archiving section the Web Cultural Heritage project led by us.
New thematic harvest:
Prague will bid for the 2016 Olympic games.
Web Cultural Heritage
The one-year project entitled Web Cultural Heritage was finished in September 2006 and the final claim assessment based on the final report and finances was approved by the Commission's financial services.
What is WebArchiv?
WebArchiv is a digital archive of Czech web resources which are
collected with the aim of their long-term preservation. The National Library of the Czech
Republic, in cooperation with Moravian
Library and Institute of Computer
Science of Masaryk University, has been organizing preservation of these
documents since 2000. Tools developed by the Internet Archive, and the International Internet Preservation
Consortium (IIPC) respectively are used for web archiving. WebArchiv is a
member of IIPC from 2007.
What is the purpose of web archiving?
- The need to preserve for future generations non-print documents of
cultural, artistic and historic value
- Enormous growth of electronic online resources published solely on
- The ephemeral nature of electronic resources – valuable documents can
be irretrievably lost
What resources can be found in WebArchiv?
- Digital documents freely available via Internet
- Publications with research and artistic focus, news and current
- Periodicals, monographs, conference papers, research and other reports,
scholarly publications, etc.
- Textual, and to some extent also visual and sound, documents existing
only in digital format
The aim is to archive everything that has ever been published on Internet
within the Czech web. However, this goal cannot be technically reached and
besides, not all resources published on web are by their nature suitable for
archiving (e.g. promotional material). For these reasons the archiving is
following three paths: