Search

By the Book

Article Summary #10

In terms of image indexing, this piece is quite venerable – from 1999 – but still has useful advice for today’s metadata specialists. The author begins by noting the vast increase of the number of images seen and consumed in the last few decades, and argues that it is important that these images be made easily findable on the Internet. A particular need identified to make finding these easier is that of subject access, information that (especially then, but often now too) is rarely sufficiently provided. This problem happens for a number of reasons, such as a perceived lack of importance, as well as difficulty indexers face in defining subjects. This second reason stems partly from different levels of analysis needed to come up with subjects for a picture. The most basic, “ofness,” simply states what is actually in the picture. But pictures often can represent other things than what they strictly contain – what their “aboutness” is. The article’s illustration of this point is a picture of two wineglasses being clinked together. The “ofness” subject is “glasses,” but the “aboutness” is “celebration.” Assigning “aboutness” requires human interpretation and judgment. In order to achieve this, the author suggests several strategies to help encourage good subject indexing. These include familiarity with intended audiences, strict vocabulary control, consistency with the level of “aboutness” indexing, and the encouragement of experimentation and evaluation.

This article reminded me how recent digital image indexing is, and how experimental it was within my own lifetime. It also gave useful subject area advice for if/when I help develop indexing guidelines in my professional life.

Article Summary #9

This article gives an interesting perspective on how experienced catalogers and cataloging techniques are being applied to the creation of metadata for digital collections. This is being done in three main categories: through quality control, authority control, and creative cataloging. Quality control is an obvious but important issue in metadata today, especially as many metadata systems currently used were not originally developed with library use in mind. Combined with rushed work and undertrained staff, this can often lead to poor quality metadata. Experienced catalogers, who are used to these kind of problems, can apply their ingrained attention to completeness, accuracy, and consistency to help address these problems. This experience is also an important part of helping bring authority control to metadata – catalogers have been working with authority files for many years, and are familiar with using and creating reliable authority standards (such as the Library of Congress’s). Finally, catalogers help create quality metadata through creative cataloging. The article defines this as the process of creating useful content in the “gray areas” where cataloging rules don’t have explicit instructions. This skill is important for metadata for digital collections, as they often deal with unique items which require special handling and information. Catalogers can bring their experience to bear in this way by collaborating with subject specialists and putting extra research into topics to create the best possible records for users.
This article showed how the skills and knowledge catalogers have accumulated over the decades can be helpfully translated into creating quality metadata. (As I want to work in cataloging/metadata, this is an encouraging demonstration.)

Article Summary #8

Cervone’s article discusses at the length what he perceives as the different needs and goals of learning object repositories as opposed to traditional digital repositories. One major difference claimed is that of organization – learning object repositories need to be intuitively navigable by a number of criteria that traditional repositories often don’t have, such as keyword, educational level, and item format. This has to be made possible in part by specialty metadata assigned to learning repository objects. Such repositories should also allow social functionalities, creating the option of an “informal review process” of the objects they contain. This way, faculty members can comment on their experiences with using the different learning objects. Another major need he identifies is for learning repository objects to be designed for reuse – whether “as is” in a slightly altered contexts, or as copies made from the original. This also means the objects should be open access or available under a Creative Commons license, and should use standard formats that all users would be able to utilize (such as ODF documents and HTML5 tags). All of these requirements, although sometimes partially covered by traditional repositories, generally necessitate specially designed learning object repository software, such as DOOR, Ariadne, and Rhaptos.

Going methodically through and examining all the special characteristics of learning object repositories helped me understand the differences between these and more standard digital repositories, as well as the difficulties that might be involved in trying to adapt mainstream repository software to build a repository of learning objects.

Article Summary #7

This brief piece presents and discusses a number of challenges that digital preservationists have to address when preserving materials. As the author begins by pointing out, preserving a digital material involves not only saving the file itself, but also ensuring that the infrastructure that makes it accessible is also available. This is addressed by different techniques (migration, emulation, normalization), which have different advantages depending on the resources and needs of the preserving institution.

Some challenges discussed include data volume (the amount of digital materials produced makes systems dealing with them more complex and expensive), archivability (choosing what should be kept and what shouldn’t), multiplicity (digital objects are likely to have multiple copies, which makes it less likely that they’ll be lost but more likely they’ll be preserved in a substandard format), and hardware/storage (the physical objects which store digital information such as floppies, USBs, etc. degrade quickly). Even software can represent a challenge, as newer versions of programs may not run files created with older versions, a problem that ties in with the challenge of different (sometimes proprietary) file formats. Privacy and legality concerns also arise when thinking about the content being preserved, which might contain personal details and sensitive records. One obvious issue mentioned that I hadn’t thought about was that of metadata – if it is missing or incomplete, the digital object might be undiscoverable. And finally, of course, all of this takes precious resources, which are in finite supply.

The article gave me a good, basic understanding of the main issues that digital preservationists have to deal with on a daily basis. If I do end up working in digital preservation, it’ll be really useful going in with an accurate idea of what the main problems are.

Article Summary #6

“Ending the Invisible Library” expounds nicely on the last article I reviewed. The author talks a little bit more about Google’s “Knowledge Graph” panels (the information panels that sometimes pop up next to search results). These are drawn from the Knowledge Graph, which has over 500 million data objects, complete with facts about them and relationships between them. This makes it a great example of a “semantic technology” – Web technologies evolving to be more about data objects and their relationships than a series of pages connected by links. This evolution presents a problem for libraries, as it makes already outdated MARC records even less able to make library holdings visible through web searches.

Fortunately, alternate solutions are being developed. BIBFRAME, developed by Zepheira, is meant to “translate” MARC to the new linked data model. To further this goal, Zepheira announced the Libhub Initiative as a “proof of concept project.” This project will link library systems together with linked data, making library holdings easily visible in searches (and possibly even in Knowledge Panels). It would also give libraries control over their own data, a welcome change from having to rely on vendors.

I found linked data and BIBFRAME very confusing topics when first introduced, but fortunately every article makes them a little clearer. This did a good job of giving me an overview of how linked data works, and what BIBFRAME’s concrete benefits would be. It also made me realize what a fundamental change in thinking the Knowledge Graph panels represent – I had thought they were just a slightly helpful extra perk for searchers.

Article Summary #5

This article presents a brief background on linked data, then discusses the ways that library institutions are working to make library data available in this format. In 2011, three major search engines (Google, Bing, and Yahoo) announced the launch of schema.org, a “structured data markup vocabulary that enables webmasters to nest metadata,” which helps speed up the adoption of structured data markup and has helped search engines do some exciting new things. An excellent example of this is Google’s “Knowledge Graph” panels – the small boxes of information that sometimes pop up to the right of search results. And this is just a small sample of how schema.org is helping create a “Web of Data,” evolving from a web of separate pages.

Although from the early days of the Web libraries have mostly blocked search engines from crawling their sites, more librarians are now pushing to increase visibility on the web. This effort faces increased challenges from the emerging “Web of Data,” as libraries’ traditional MARC records are structured very differently from metadata browsers can read. OCLC and the Library of Congress are trying to overcome these obstacles, although in slightly diverging ways. LOC is working on BIBFRAME, creating a compatible standard still tailored to library needs from the ground up. OCLC is starting with schema.org’s microtags, lobbying and adapting them to make them more library friendly.

This article was extremely helpful for me – partly because anything that helps make BIBFRAME and linked data clearer for me is valuable – and partly because it showed that there are always multiple useful ways for libraries to adapt and change to new technologies.

Article Summary #4

The Digital Public Library of America (DPLA) offers a single interface that lets users search for digital content across many different institutions. Since the search must work with many different metadata sets coming from many different sources, interoperability (“the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality”) becomes a truly crucial issue. This article describes ways the DPLA achieves the interoperability needed to function, especially through the use of service hubs.

DPLA links to content from two types of entities: content hubs and service hubs. Content hubs are really large organizations – including the New York Public Library and HathiTrust – which by themselves each submit 200,000 or more items. But if organizations with fewer items want to participate, they have to band together in a group to form a service hub. The service hub helps to standardize and enhance the metadata from the organizations that make it up, ensuring that it is easier for them to be accepted into DPLA. Another important way that DPLA is able to make these many different records interoperable is through use of the metadata application profile (MAP). This is a set of metadata elements pulled from many of the commonly used schemas, meaning that the DPLA can understand how these different schema elements relate to each other.

The article traced how a set of institutions in Missouri set about creating a service hub and being harvested for the DPLA. Since succeeding, they have donated over 41,557 items to the DPLA. The members think the benefits conferred by inclusion (particularly an increased number of views of their digital collections) outweigh the costs and problems of getting to that point.

Right off the bat, the article taught me much more about the DPLA (I’d only visited it briefly before – it’s really neat). But it also discussed really practical ways to address interoperability problems, both through the MAP and service hubs. I’m sure these would be good options to think about if I ever have to deal with interoperability issues myself in the future.

Article Summary #3

I looked over an article released in 2003, in the very early days of MODS and METS, which outlines their basic structures and functions.

MODS (Metadata Object and Description Standard) was developed as a sort of MARC-lite schema – somewhere between Dublin Core and MARC in complexity. Although its semantics are the same as MARC, MODS uses language based tags (easier for people to understand). Although it doesn’t convert perfectly to MARC, its tags generally should map well. It is a good option for original resource description that is fairly compatible with other schema.

METS (Metadata Encoding and Transmission Standard) is an XML document for packaging a digital resource’s metadata. It is composed of six modules that point to different types of metadata, though only two (header and structural) are absolutely required. The descriptive module is the part that holds the information librarians routinely work with – the records, cataloging, etc. It can facilitate use of multiple schema (including MODS).

I haven’t researched changes that the intervening years might have brought (which might be a future blog post to look into), but the basic guidelines included in the article were very useful to look at. Reading the original intentions and structures of MODS and METS gave me a basic knowledge of what they are and how they work, while the example projects mentioned showed me some practical applications of the standards.

Controlled vocabulary for football, anyone?

Our LS 566 group spent the weekend trying to find a controlled vocabulary specifically for football (instead of using something like LCSH or creating our own). We have so far been unsuccessful (so if anyone has suggestions, I’d love to hear them).  The problems that I kept running into are ones that libraries always struggle with, and which I’ve heard discussed multiple times in SLIS courses. Primarily:

Cost – creating and maintaining a controlled vocabulary takes time, expertise, equipment, and information, which can get pretty expensive. For this reason, a majority of the useful-looking controlled vocabularies I found (several are listed on Taxonomy Warehouse – they are sports-specific ones, though not football-focused) are only available to paying customers. [This gave me the idea of checking UA’s library databases, though it didn’t seem like something we were particularly likely to have.]

Variants in natural language – what Americans call football is not the same sport that most of the world knows as football. My teammate was given a link to a controlled vocabulary that seems to be about soccer. Many of my search hits also presented this problem, making me have to search “American football” instead.

 

 

 

Blog at WordPress.com.

Up ↑