Search

By the Book

Month

April 2017

Article Summary #5

This article presents a brief background on linked data, then discusses the ways that library institutions are working to make library data available in this format. In 2011, three major search engines (Google, Bing, and Yahoo) announced the launch of schema.org, a “structured data markup vocabulary that enables webmasters to nest metadata,” which helps speed up the adoption of structured data markup and has helped search engines do some exciting new things. An excellent example of this is Google’s “Knowledge Graph” panels – the small boxes of information that sometimes pop up to the right of search results. And this is just a small sample of how schema.org is helping create a “Web of Data,” evolving from a web of separate pages.

Although from the early days of the Web libraries have mostly blocked search engines from crawling their sites, more librarians are now pushing to increase visibility on the web. This effort faces increased challenges from the emerging “Web of Data,” as libraries’ traditional MARC records are structured very differently from metadata browsers can read. OCLC and the Library of Congress are trying to overcome these obstacles, although in slightly diverging ways. LOC is working on BIBFRAME, creating a compatible standard still tailored to library needs from the ground up. OCLC is starting with schema.org’s microtags, lobbying and adapting them to make them more library friendly.

This article was extremely helpful for me – partly because anything that helps make BIBFRAME and linked data clearer for me is valuable – and partly because it showed that there are always multiple useful ways for libraries to adapt and change to new technologies.

Article Summary #4

The Digital Public Library of America (DPLA) offers a single interface that lets users search for digital content across many different institutions. Since the search must work with many different metadata sets coming from many different sources, interoperability (“the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality”) becomes a truly crucial issue. This article describes ways the DPLA achieves the interoperability needed to function, especially through the use of service hubs.

DPLA links to content from two types of entities: content hubs and service hubs. Content hubs are really large organizations – including the New York Public Library and HathiTrust – which by themselves each submit 200,000 or more items. But if organizations with fewer items want to participate, they have to band together in a group to form a service hub. The service hub helps to standardize and enhance the metadata from the organizations that make it up, ensuring that it is easier for them to be accepted into DPLA. Another important way that DPLA is able to make these many different records interoperable is through use of the metadata application profile (MAP). This is a set of metadata elements pulled from many of the commonly used schemas, meaning that the DPLA can understand how these different schema elements relate to each other.

The article traced how a set of institutions in Missouri set about creating a service hub and being harvested for the DPLA. Since succeeding, they have donated over 41,557 items to the DPLA. The members think the benefits conferred by inclusion (particularly an increased number of views of their digital collections) outweigh the costs and problems of getting to that point.

Right off the bat, the article taught me much more about the DPLA (I’d only visited it briefly before – it’s really neat). But it also discussed really practical ways to address interoperability problems, both through the MAP and service hubs. I’m sure these would be good options to think about if I ever have to deal with interoperability issues myself in the future.

Article Summary #3

I looked over an article released in 2003, in the very early days of MODS and METS, which outlines their basic structures and functions.

MODS (Metadata Object and Description Standard) was developed as a sort of MARC-lite schema – somewhere between Dublin Core and MARC in complexity. Although its semantics are the same as MARC, MODS uses language based tags (easier for people to understand). Although it doesn’t convert perfectly to MARC, its tags generally should map well. It is a good option for original resource description that is fairly compatible with other schema.

METS (Metadata Encoding and Transmission Standard) is an XML document for packaging a digital resource’s metadata. It is composed of six modules that point to different types of metadata, though only two (header and structural) are absolutely required. The descriptive module is the part that holds the information librarians routinely work with – the records, cataloging, etc. It can facilitate use of multiple schema (including MODS).

I haven’t researched changes that the intervening years might have brought (which might be a future blog post to look into), but the basic guidelines included in the article were very useful to look at. Reading the original intentions and structures of MODS and METS gave me a basic knowledge of what they are and how they work, while the example projects mentioned showed me some practical applications of the standards.

Controlled vocabulary for football, anyone?

Our LS 566 group spent the weekend trying to find a controlled vocabulary specifically for football (instead of using something like LCSH or creating our own). We have so far been unsuccessful (so if anyone has suggestions, I’d love to hear them).  The problems that I kept running into are ones that libraries always struggle with, and which I’ve heard discussed multiple times in SLIS courses. Primarily:

Cost – creating and maintaining a controlled vocabulary takes time, expertise, equipment, and information, which can get pretty expensive. For this reason, a majority of the useful-looking controlled vocabularies I found (several are listed on Taxonomy Warehouse – they are sports-specific ones, though not football-focused) are only available to paying customers. [This gave me the idea of checking UA’s library databases, though it didn’t seem like something we were particularly likely to have.]

Variants in natural language – what Americans call football is not the same sport that most of the world knows as football. My teammate was given a link to a controlled vocabulary that seems to be about soccer. Many of my search hits also presented this problem, making me have to search “American football” instead.

 

 

 

Article Summary #2

This short article contains a helpful discussion with Laura Dawson about the ISNI (International Standard Name Identifiers, pronounced to rhyme with Disney). ISNIs identify people the way that ISBNs identify books. Assigning unique numbers to writers, artists, and other public figures makes it easier to distinguish between different people with similar names. This can also be very helpful in collecting all the different spellings of a non-English author’s translated name under one number to make it clear that these refer to the same person. ISNIs are overseen by the ISO (International Organization for Standardization), which also oversees ISBNs, ISSNs, and DOIs. Under ISO’s governance, the ISNI International Agency sets ISNI policies. Using these policies, the ISNI Assignment Agency (which is currently the OCLC, interestingly enough) assigns the actual numbers to names. Registration agencies act as go-betweens for the OCLC and those wanting ISNI numbers.

Dawson’s answers gave me a good basic knowledge of the purpose and structure of ISNIs. Knowing more about how these are determined and governed seems likely to help me use ISNIs more effectively in creating metadata records in the future (both in identifying when they are used and possibly putting in my own metadata records).

Blog at WordPress.com.

Up ↑