Polona/Labs

Technological process of digitizing library collections


One of the main objectives of digitization is to enable users’ access to printed resources stored in libraries from anywhere in the world without having to leave the house.

Digitization takes place at various levels of detail. In small institutions, all activities this process entails may be performed by 1-2 people, while larger projects require the involvement of several dozen people or even more. Therefore, the first step before starting the work is to plan the entire technological process and assign responsibilities for individual tasks.

It is very difficult in practice, because what needs to be taken into account are different expectations, problems and obstacles which may occur throughout the whole process of digitization, despite most scrupulous preparations and refined plans being in place.

The very process of digitization may seem to be a very simple activity. In principle, it involves processing of documents from analogue to digital form. In the case of prints, it entails taking a picture or making a scan with a digital camera or scanner and saving it on a disk. Digitization of sound documents (music, audio books) requires converting analog signal into digital signal using dedicated software. The sound recorded on a phonograph record, magnetic tape, etc. is played  using a reel tape recorder or cassette tape recorder connected to the computer and recorded in real time. The software installed on a computer converts the sound from analog into digital signal and records it in the latter form. In the case of films, the digitization process is similar to that of sound documents. The only difference is the software and additionally TV card used to capture image and sound.

Hence, in simple terms, digitization is an uncomplicated activity which can be performed at home, provided that proper equipment is available. On a small scale, virtually anyone can digitize their own collections, and this process needs not to be very expensive. However, in institutions planning digital recording of collections on large scale, this process will be more complicated, time-consuming and obviously – costly.

In this text, I will focus on describing the process of mass digitization of paper documents. I will refer to some examples from my practical experience of working on projects under the name Jagiellońska Biblioteka Cyfrowa [The Jagiellonian Digital Library] (JBC), Patrimonium – the digitization andrelease of Polish national heritage from the collections of the National and Jagiellonian libraries (Patrimonium) along with a few more projects in which I had an opportunity to participate.

 The Jagiellonian Digital Library home page

The Digital National Library POLONA home page

The digitization process begins with categorisation of documents. Many factors are taken into account depending on the type of project, including but not limited to collection profile, state of preservation, academic or scientific value, local significance for the town, region, country, and usefulness for the user. For mass digitization, the selection of objects does not need to be so detailed, as the objective is to archive and share or disseminate as many resources available in digital form as possible. In such a case, the cost may be the only constraint.

In the first place, the collections considered most valuable or threatened with destruction are selected for digitization. If documents are available in a digital version, it helps the library to restrict general access to the objects of exceptional cultural value, which otherwise would be exposed to damage, and thus save the content of the prints. Library materials issued in the 19th and 20th centuries also face degradation caused by technologies applied at that time, as well as due to the conditions of storage, access and also environmental pollution. The threat to library collections arising from with the so-called “acid paper” is called a silent chemical catastrophe. The only applicable conservation procedure to safeguard such materials is deacidification, which suspends the process of degradation, and digitization makes it possible to share the collections only in digital form from that time on.

The entire mass digitization process implemented in an institution is supported by software controlling the course of individual tasks, facilitating management and decision-making. This may be achieved using one system (e.g. POLONA – Digital Repository of the National Library – RCBN) or several systems concurrently (e.g. Jagiellonian Digital Library – Dlibra and the Jagiellonian Library Service System – SOJBC).

BN Digital Repository [the National Library]

In the following phase of digitization process, the documents are transferred from an archive or another storage for processing. A bibliographic description based on the original document is entered into the library’s computer catalogue. Subsequently, metadata is imported into the system managing the digitization process. At this point, additional technical information about the document is provided, such as the description of possible defects, damage, condition, dimensions, etc. It is very convenient and useful for employees performing particular tasks within the process of  digitization.

Once the document is described, it is forwarded to a reprographic workshop for scanning, and the resultant digital copy is recorded on a disc array.

Nowadays, for scanning special collections and large format documents, special cameras and book scanners are used to ensure high quality of resulting photographs and scans and to guarantee safety of the scanned document (by means of cold light used in the process does not heat the scanned document, while short-term irradiation reduces the risk of negative impact of light). Such scanners offer low failure rate, however, they are very costly. In the past, it was believed that flatbed scanners could be a good alternative, but they proved not suitable for mass digitization of documents, especially large format prints and very valuable volumes, due to high risk of damage involved (the document needs to be rotated, placed on the scanner and pushed with a lid).

The ensuing phase of digitization process entails three processes which are often performed simultaneously: digital copy archiving, conservation of the original document and digital processing of the copy.

Professional archiving is carried out using disc arrays and magnetic tapes managed by tape libraries. Digital archival copies of documents are generated in several counterparts and stored in a lossless format (eg TIFF, RAW) in at least two independent locations. Depending on the solution adopted, a digital copy may be saved on a disk, on a tape or simultaneously on both storage mediums.

If an original document requires conservation, the process is performed in parallel to archiving. A document may also be protected before scanning, if there is a risk of irreparable damage during digitization.

While archival versions should not be subjected to processing that might change their content, in order to provide users with a copy of the document in a convenient form better suited to their needs, it can be digitally processed using raster graphics editing software. Its main purpose is to prepare a document for general access online or locally. The processing involves enhancing graphic and visual quality of scans, text recognition (OCR – Optical Character Recognition) and conversion into presentation formats (PDF, JPG, PNG, etc.). This is achieved by means of various paid software and freeware. The most popular include: Adobe Photoshop, Corel PHOTO-PAINT, Scan Tailor, Abby FineReader, XnView, Adobe Acrobat, Gimp. Different digital processing techniques are applied to various types of documents. The methods for developing digital copies of old prints and manuscripts, maps or drawings differ from the techniques applied to books or magazines. However, this issue deserves a separate analysis, as it is impossible to present it in a couple of sentences.

Once the file is ready for general access, an additional copy on microfilm is often made. However, this process is very expensive and hence it is used rather infrequently, but microfilm copies are particularly durable.

The final stage of digitization involves uploading the files for public access on the digital library portal. This process requires the scans to be uploaded access management system, adding metadata and classification into particular target collections. The sequence of activities in this phase may vary, depending on the software used.

The diagram of mass digitization process implemented in the Jagiellonian Library. Employees of all library departments are involved in performing particular tasks. The dotted line in the diagram indicates the tasks executed directly in the access management system.

This article presents the procedures for mass processing and publication of digital copies. The process described herein may vary in terms of the methods applied or sequence of activities depending on the institution implementing the digitization process. Figure 5 demonstrates that the process of digitization is very labour-intensive, long-lasting and expensive. The process of developing a single copy of a digital document may last from one to seven days. However, the overriding objective of digitization is long-term preservation of documents of particular value for culture and heritage, as well as providing access to library resources to the greatest number of online users possible.

 ◊◊◊

This publication was prepared within the framework of the Competence Centre of the National Library with regard to the digitisation of library resources, co-funded by the Minister of Culture and National Heritage.