Skip to content

Librarian in the Cloud

Sharing info thru the Web…One Web App @ a Time

  • Home
  • Presentations
  • About

Data Designed for Discovery

Roy Tennant

The Canonical Entity (of the past) — the card catalog

MARC was not created for online catalogs; it was created for card catalogs. It’s a fundamental flaw. Readable reformatting of MARC isn’t much better

The classic bib record

  •  a collection of statements
  • taken from the piece itself
  • sometimes enhanced w inferred parentheticals
  • or additional statements not on the piece (eg subject headings)
  • where punctuation, which may or may not be present is used (inconsistently) for structure
  • We’re dealing with uncontrolled text strings that are only loosely connected to anything else

Actually a number of problems

  • identification problems (titles aren’t enough; names aren’t enough)
  • linkage problems (web problem; language problem)
  • quality problems (legacy problems — strings are not controlled terms; often, they cannot be turned into them

problems

  • Hamlet problem — no clues; too many results; interface isn’t helping
  • People searches like John Rock

First define ALL THE THINGS

  • Work
  • Entity: a thing with distinct and independent existence (someone who created a work)
  • Relationship: the way in which two or more people or things are connected

Record: War and Peace

  • Entity: type: work/War and Peace/Author
  • Entity: type: person, Leo Tolstoy
  • Entity: type: place: where story took place

Entities of Initial Focus

  • person
  • place
  • object
  • concept
  • organization
  • work

Relationships bw entities are established

  • person-author
  • work-subject-concept

Critical: Using authoritative sources whenever possible (Virtual International Authority File)

And linking to other authoritative data sources

This process is called “Shredding”; try not to use record anymore. instead, using set of assertions about a thing. Place; Person; Work; Organization. Every instance of William Shakespeare.

From Records to Entities: Works

  • All manifestations of Hamlet under one umbrella
  • Making sausage
  • Some of the info from all the manifestations flow up to the work record
  • In a Worldcat record, look for the Linked Data section
  • Work example — all the manifestation of the work

Worldcat: Linked Data & Entities + LCSH; VIAF; FAST

Knowledge Vault data flow:

  • Data sources: Enhanced WorldCat; VIAF; FAST; Etc.
  • Extracts
  • Collective: Knowledge Triples
  • Fusion
  • ???

So What?

  • Exposure — as linked data has been released, better traffic
  • Improving Discovery — screen of an author, description from wikipedia; know his works and those about him;
  • By making links & making them explicit & unique, connections like these become easy, not difficult
  • Entity cards: person, org, place, concept, creativework, event + Related pple & orgs; related concepts; related places; related works
  • We’re actually using the work we’ve been doing for the last 50 years, using the machine data, in new ways, in ways that aren’t ambiguous, not text streams, but unique identifiers
  • Linked Data sets: Entity JS — War Between the States collection example: Person; Organization; Concept; Place; Event; Work — also found a way to rank these sets of data
  • Can make links dynamically & in batch

Why This Work Matters

  • Fiction Finder — Solving the Hamlet Problem
  • Embedding Authority Control — wikipedia connections — VIAF authority files embedded into Wikipedia articles — Solving the Wang/Li/Zhang Problem (the three most common names in China — name isn’t enough)
  • Helps with translation work — solving the language problem
  • MARC Usage in Worldcat – cleaning up and normalizing the data — variation of cataloging over time; mistakes; differences in interpretation of the rules, all over 50 years. — exposing the quality problem. Quality is a pursuit, not a destination

It’s not the record, it’s the linkage

  • requires a new kind of thinking: “sets of assertions about something” NOT a “record”
  • Requires much more than simply translating a record from MARC to a new format, not 1:1 translation
  • There are things we can do now to make MARC “linked data ready” – Upcoming OCLC webinar
  • Quality is a pursuit, not an endgame

For more a lot more information: Library Linked Data in the Cloud

National Library of Medicine doing a lot of work with Linked Data, too.

This is way on the cutting edge, don’t have to necessarily go out and do anything. Sit back & wait. Schema.org standard instead of BibFrame schema that LOC is developing. But support LOC BibFrame standard under development, as well.

What about RDA? How does it fit into all of this? RDA adds incremental benefit but what that is, he can’t characterize. Not sure how to qualify or quantify.

Large data projects have used crowdsourcing to help — has OCLC considered this? EntityJS & Knowledge Vault work will include a way to do this.

New world: Data flow, not static.

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

Related

Author Heather BraumPosted on October 26, 2015October 31, 2015Categories Conferences Notes, Internet LibrarianTags discovery, internetlibrarian, linked data, metadata

Post navigation

Previous Previous post: Working out the Future of Library Resource Discovery
Next Next post: Libraries and the New Education Ecosystem

heather photo - credit WildChild Photography in Holton, KS

Hi, I'm Heather, a librarian from Kansas. Check out my About page to learn more.

Find me Online

  • LinkedIn
  • Speaker Deck
  • Twitter

Recent Posts

  • Kansas libraries advocacy efforts master post March 18, 2016
  • HB2719 will be worked in Taxation Committee tomorrow March 17, 2016
  • Kansas HB2719 post-hearing reflections March 16, 2016
  • Kansas library advocacy reflections (so far) March 12, 2016
  • Kansas library advocacy efforts roundup March 12, 2016

Categories

Archives

  • Home
  • Presentations
  • About
Librarian in the Cloud Proudly powered by WordPress
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.