I want to revisit a project I worked on in my first semester in the MALS program. This is how I introduced the project back then (December 2016)
As the curator and cataloger of a zine library that has holdings going back to the early 1990s I am sometimes asked to comment on how zines have changed over time. I read and catalog zines out of time, as they rise to the top of the processing queue, which makes it hard to respond to that question with confidence, though I have my theories. My suspicion is that zine creators in the 1990s wrote more about sexual assault and critiqued capitalist systems of oppression more than their 2010s counterparts, who are more likely to write about mental health and friendship. My informed assumptions extend to the visual elements of the works, with 1990s creators working primarily, even exclusively, in black and white photocopies with photographs, reproduced zine ads, hand drawings, and riot grrrl fliers, as opposed to more sophisticated reprography, desktop publishing (InDesign, rather than Publisher or analog cut and paste).
My hope is to show the changes in zine content and presentation over time, perhaps in word clustering, counting, or clouds, exposed in an interactive timeline.
At the time, I had no experience visualizing data in any way, other than in tables, bar charts, and word clouds. I am looking forward to exploring the data anew with the skills I've gained in Visualization and Design: Fundamentals.
I frequently get asked about how zines have changed over the years--by journalists, undergraduates, senior scholars, zine makers, librarians, and audience members at panels. How zines have changed over the years could be of interest to people doing work with girls and women's studies, gender studies, creative nonfiction, media studies, cultural studies, punk history, social justice, sociology, psychology, history of the book, English, library and information studies, and in other disciplines, as they imagine. Maybe it will be appealing to data scientists, too.
I have a fall 2016 export of catalog records from the Barnard Zine Library that contains metadata about more than 4,000 zines. Ideally, I would work with zine records from other libraries, but, since my long-term project, ZineCat, does not yet hold a sufficient amount of records from additional libraries, I have to work with what is a substantial dataset on its own.
For this project I will focus on the Library of Congress subject headings (LCSH) that describe the zines. LCSH is a flawed controlled vocabulary employed by academic and other research libraries to describe materials of all format types. I saw flawed because the vocabulary reflects a massive quantity of library holdings, and is rooted in colonialism, hegemony, Congress itself, and is known to be othering--reifying default identities: whiteness, maleness, heterosexuality, cisgender, Christianity, etc. Still, it's the system we have, and I use in my zine cataloging practice.
I hope to visualize the data in a few different ways:
A treemap for each lustrum and overall, to get a sense of the most represented LCSH.
A dispersion plot, also for each lustrum and overall to better illustrate the change over time.
The top n (5?) LCSH as they are represented in each lustra.
Notes from chatting with Erin 6/14/18
Like with UN: region--sub region--country
LCSH and genre terms
% of total
Area chart for looking at spikes in holdings
- Dear dog, the subject headings and genre terms are all mashed together in one column. That's going to suck.
Data cleaning steps
- Opened the whole csv file cliozines.
- Deleted unnecessary columns.
- Removed undated, pre-1990 and post 2015 zines.
- Changed to .xls so multiple sheets could be saved
- Um, why do only 225 records have "automatic tags"? Going back to the export…
- I'm concerned that if I repeat what I did last time, opening the .mrc file in zotero, I'll lose all my LCSH again because they'll be grouped in automatic tags and somehow consumed.
- I've asked the internet for help, but it seems like I need XSLT to transform the records. There are a couple/few online transformation tools, but they're failing me. Maybe because my file is too large.
- Downloading MARCedit.
- I am all about trying to convert mrc to csv using MarcEdit.
- Instructions for that on a friend's repo
- MarcEdit keeps crashing at a crucial moment.
- Installing HomeBrew, after Ring TFM.
- Now I'm going back to trying to convert from xml and txt.
- The txt conversion worked, but was not useful.
- Here's what the xml conversion looked like
- I've asked a friend with MarcEdit to see if she can process my file without crashing the application. Fingers crossed.
- It's frustrating that I have to struggle so much with the data before I even get it to Tableau. (Before I get it to Excel, even!)
- While I'm waiting for Rhonda, I'm trying a few things with the txt conversion to see if Text to Columns can do anything for me.
- Or PIVOT TABLES because they're MAGIC.
- And failing me. (nb reverse that)
- Text to column is not useful in the xml to csv file.
- Next in-the-meantime task: working with the small dataset from my old file that has LCSH (223 of 4,551)
- I'm getting rid of two more records that have asterisks in the automatic tags column. Why???
- Ugh, the automatic tags column pre-sorted alphabetically. Zotero, I love you, but I hate you.
- Deleted a few more that are from outside this dataset.
- More handcleaning because stupid Zotero smushed and alphabetized.
- Tried: exporting records directly from the catalog. That doesn't appear to be possible for me.
- Got XML to JSON. Now trying JSON to CSV. My file is too big for the GUI converters.
- I HAVE BEEN TRYING TO GET AT MY DATA ALL DAMN DAY.
Starting a new UL now that I HAVE MY DATA.
- First up: adding a unique key column
- Remove dates outside 1990-2015 in a new sheet
- Concatenating 260 and 264 fields
- Sorting on 264 (date)
- A little bit of hand-cleaning, but not an outrageous amount.
- After I concatenated into 26x, I copied the data into a values-only column, so I could delete the 260 & 264
- Lots of turning numbers into numbers
- Lots of zines have funky publication dates. I'm excluding records from my data set where date ranges map across lustra.
- Fixing dates took forever.
- LCSH and genre text to columns went fine, but now to I have to long and skinny them? Or will that make it look like there are as many zines as there are LCSH associated with them, i.e., make it seem like there are multiple copies of the same zine, just with different LCSH?
- 9:24pm - uploaded file to Tableau
- I made a key table
- And left joined it to each of the lustra: 1990-1994, 1995-1999, 2000-2004, 2005-2009, 2010-2014
- Thoughts and prayers
- Slept exceedingly late, dreaming about my treemap
- Also realized my key-only table shouldn't have been key-only. Having all the LCSH and genre terms in one table might also have been handy. Going to do that.
- Why does Tableau assume I want to join on the date, rather than on the key? It doesn't matter, it just seems strange. I guess date is a good join point, too. Should I join on both?
- I'm just doing things, and I don't understand why!
- I finally got something that looks like a tree map, but then it turned into one giant box.
- I don't know why my first 650 isn't full of headings.
- I need to get rid of all the OCLC applied FAST headings.
- Set vs. group. Answer: hierarchy!
- I really can't handle how many steps there are to making anything happen. I want to ask for help, but first I have to save my current workbook to Tableau Public, and I keep getting an error message I have to troubleshoot. Aargh!
- The damn tree map is done. Sort of. Things I'd like to add/change
- LCSH instead of genre?
- Click on master table genre and see it on all the tables.
- I'm adding a column for lustra, which I think is sort of cheating, but…
- My big challenge right now is figuring out how to get the plus sign in a pill.
- Also, I didn't really need to join sheets, I don't think.
- How did we do that action on Arthur?