by Rebecca Repper
In July I had the pleasure of focusing on some training for my research project 'Reconnecting Western Australia's Photographic Collections'. I was not even 6 months into my PhD candidacy, but I am planning to be looking at a lot of collection metadata from at least four different institutions. I needed to make sure that what I was envisaging to do was possible (for me), and practical (with the time and resources I have). Therefore, the purpose of this training was to make sure that I understood the basic skills that I would need to access and understand the collection data, and also the possibilities to process that data so that I could ask the questions I needed.
You might not think about it this way, but cultural heritage institutions like Libraries, Museums and Archives do not just collect items, such as books, paintings or artefacts, but they collect data - a lot of data. Each 'thing' in their collection has masses of associated information, such as its provenance, where it is from, who created it, who collected it, what the item is, what it is made out of, what is its size, what curatorial department it belongs to, what exhibitions it has been part of, what it depicts or talks about … the list goes on. As I said - LOTS OF DATA! And today, most of this data is stored and processed with computers. I need to make sure I can tell the computer what I want to do with all of that data.
My first stop was the lovely University of Oxford to attend the Digital Humanities Oxford Summer School (DHOxSS). Here I was enrolled in the course 'Humanities Data: A Hands-On Approach' which focused on tools, methods and concepts for managing, organizing, cleaning and processing data. This was very much an introductory level series of workshops designed at showing participants just enough so they could identify what may or may not be of use. They started us with an HTML file (what you get when you are on a webpage, right click and 'view page source') and asked us to 'find the data'. I honestly could not find the data - it was an important lesson about not being scared to start familiarizing ourselves with code so that we can problem solve our way forward in a digital world. Over the coming days we were introduced to the light and dark side of MS Excel (it can be your friend, and your darkest enemy), trying our hand at SQL, cleaning data in OpenRefine, visualizing data in open access tools like 'FusionTables', and a taster of Python. These practical workshops were balanced with presentations from current digital humanities projects (you can access them on their Oxford podcast series), and seminars with great information about planning and managing data projects, data structures, copyright and open access of data. I also had the opportunity to meet with a lot of other people working with cultural heritage data and gained some amazing insight into the potential and challenges in our fields of interest. Although I did not learn all the answers and tricks to working with collection data in this week of training, I did learn which tools to start focusing on to start effectively working with my .csv and .xml files of collection data.
For the remainder of the month of July, I was placed with The British Museum ResearchSpace project. Here I practically learned how to map a dataset from its source to the CIDOC-CRM standard with the held of Dominic Oldman and the ResearchSpace team, and in such a way that it can be imported into the ResearchSpace platform. I am mapping data into CIDOC-CRM so that I may understand the data about photographs collectively instead of separately. CIDOC-CRM is an information standard developed by the CIDOC Documentation Standards Working Group and the International Council of Museums (ICOM) with the aim to express the full richness of cultural heritage materials' data, and is an official international standard (ISO 21127:2014). This 'richness' expressed by CIDOC-CRM should be ideal for photographs' data because photographs are collected across all types of collecting institutions. Both the State Library of WA and the WA Museum have some of their data available online - so I utilized this data for my training in mapping to CIDOC-CRM. I was surprised at how the mapping process challenged my assumptions about the collection data, and clarified the meaning of data fields. The difference in how the State Library and Museum created photograph 'records' was revealed tangibly through the mapping, despite there being many commonalities in types of information recorded. I am looking forward to delving into the data more to understand these differences and commonalities in the coming years. Through this process I utilized the Google Docs App draw.io to visualize my mapping, OpenRefine to clean and process my datasets from spreadsheets to .xml files, and the 3M mapper tool. I will need to use Python to edit .xml files, but I am still working on my skills in this area.
My month concluded at the ResearchSpace Symposium and workshop 'Building Cultural Heritage Knowledge' where I presented alongside Dr. Toby Burrows, sharing my progress working with and understanding WA's photographic collections. You can read Toby's blog about the conference here on the CTW website. I was able to attend the Symposium thanks to a bursary funded by the conference funder the Andrew W. Mellon Foundation through ResearchSpace, and a UWA Graduate Research School Travel Grant. I can't thank them enough for helping me get to the U.K. for this training. I finished my month very much aware at how much digital methodologies are becoming an integral part of how we manage and research data, and how much there is to learn. The bells and whistles of the digital world can be quite distracting (I strongly recommend listening to some of Andrew Prescott's closing address at DHOxSS), but some of the presentations at DHOxSS and the Research Space Symposium reminded me also of the very practical and necessary outcomes of some of these tools and projects. I feel like I have made a proactive leap to my own data methodology and research, and I strongly recommend anyone wishing to work with collection data to start delving and not be daunted by the digital world.