New to the SCN: Analyzing Institutional Publishing Output

This is the latest post in a series announcing resources created for the Scholarly Communication Notebook, or SCN. The SCN is a hub of open teaching and learning content on scholcomm topics that is both a complement to an open book-level introduction to scholarly communication librarianship and a disciplinary and course community for inclusively sharing models and practices. IMLS funded the SCN in 2019, permitting us to pay creators for their labor while building a solid initial collection. These works are the result of one of three calls for proposals (our first CFP was issued in fall 2020; the second in late spring ‘21, and the third in late fall 2021).

Today we’re excited to share “Analyzing Institutional Publishing Output: A Short Course” (available in the SCN OER Commons Hub as well as in Google Drive and Penn State’s ScholarSphere), contributed by Allison Langham-Putrow and Ana Enriquez, who both do scholarly communication work, at University of Minnesota Twin Cities and Penn State University, respectively. Here’s Allison and Ana to introduce their project:

We met way back in October 2018 at the “Choosing Pathways to OA” working forum held at University of California-Berkeley, a meeting at which attendees were encouraged to discuss a wide range of options for moving from a subscription-based world to one in which library budgets are used to support open access publishing.

We had both been analyzing publishing patterns for our institutions. Allison, influenced by an opinion piece by Liam Earney, a blog post by Danny Kingsley, and a class at the 2018 FORCE11 Scholarly Communication Institute by Katie Shamash titled “How Much Does Open Access Cost?”, had been looking into where researchers from University of Minnesota publish, trying to figure out how much was open access, and looking (in vain) for information on peer review and editorships. Ana was using publishing data to plan outreach programs and trying to learn about Penn State authors’ APC payments.

The “read-and-publish” agreement between MIT and the Royal Society of Chemistry was advertised as the first such agreement in the US. It received a lot of press, at least in the scholarly communication/open access world, and was a hot topic at the Choosing Pathways event. It was just a matter of time before the model came across the radars of our libraries’ administration. In fact, it was just about two months after we met.

Since late 2018, we’ve done analysis of publication patterns for our institutions and for the other members of the Big Ten Academic Alliance—overall publication and with specific publishers. We’ve learned a lot and proposed this course for the Scholarly Communication Notebook because it’s something we think will become even more important as publishers continue to design and push new types of OA publishing agreements.

We also just think data analysis is fun and want to share the fun.

So what is Analyzing Institutional Publishing Output: A Short Course? It’s a set of training materials that walk through how to create a set of publication data, gather additional information about the data through an API, clean the data, and analyze it in various ways. We separated it into two sections: Section 1 describes how to build a dataset using data from one of three sources (Web of Science, Scopus, and the Lens) and using the Unpaywall API, via OpenRefine, to enrich it with open access information; Section 2 has five lessons on analyzing the dataset. One of us prefers Excel for analysis; the other prefers OpenRefine. We learned a lot from each other, but no one’s mind was changed, so we wrote the analysis lessons with instructions for both.

By doing these analyses, librarians can develop a critical eye for the data and learn to work with it to make sustainable and values-driven decisions. Library agreements with publishers are at a crucial turning point, as they more and more often include OA publishing. Our short course prepares you to enter into negotiations with a publisher. Publishers hold a lot of power and having a deep understanding of what publishing looks like at your institution can make the uneven playing field of library-publisher negotiations slightly more even.

We hope you’ll take our course and that you’ll share it with others.

About the Authors

Allison Langham-Putrow is the Scholarly Communication Librarian at the University of Minnesota. She supports the University of Minnesota Libraries in exploring new approaches for sharing, preserving, and enhancing the impact of scholarly activity. Her background is in engineering, having earned a Ph.D. in chemical engineering, and she has over 20 years of research experience. She cares deeply about open access to research and works with colleagues and publishers on how to make open access happen in an equitable way.

Ana Enriquez is the Scholarly Communications Outreach Librarian at the Penn State Libraries. A copyright lawyer and librarian, Ana works to improve access to research at Penn State and through inter-institutional collaborations. She also teaches the university community about open access, publishing, copyright, and related topics.