Wednesday 10 February 2016

I'll show you my research data if you show me yours...

My research data
A few months ago I was having a clear out at home and came across a bunch of floppy disks in the drawer of my bedside table.

This is my research data...

Actually, that is not strictly true. I did a taught masters course and my research consisted of just a short dissertation at the end of the course. Most of these disks contain files from the taught element of my course and the subsequent dissemination of results. 

I published a paper at the end of the masters on the findings of my dissertation. 

If you are interested in the placement of
Iron Age hillforts in the landscape then
this is the book to look for.
No-one has since approached me and asked if they can see the data that underlies this publication

...but this was the 1990's! 

Times are different now. We expect our researchers to be able to produce the data and share it (where appropriate) so that others can build on their research. 

I'm now involved in teaching researchers here at York about Research Data Management (RDM) and how they should look after their data for future re-use.

When I created and stored this data I was not a digital archivist. I had no idea I would become a digital archivist. I like to think I would have managed my data differently if I had known more. 

Let's start with documentation. Much of the documentation for this data is what is actually written on the disk labels. I gave myself a little pat on the back for having recorded what was on the floppies so well on *most* of the disks. This of course was particularly useful in those days. File names were restricted to 8.3 characters so very little detail about the files could be incorporated into the name. Documenting things on disk labels helps add a bit of context. All well and good until you notice the disk on the far right with no label at all. This one remains a mystery!

So what are the issues here. First and most obviously, as a student in the 90's I was using cutting edge storage technology - the floppy disk! Can we read these today? Yes and no. Floppy disks fall firmly into the category of 'obsolete media' which is a topic that we digital archivists like to talk about. I found I could read about a quarter of these using the USB floppy reader that is attached to my PC. For the others I saw a lot of error messages like this:

The answer is "No"!

Fortunately I had more success using an old PC I keep in my office for the very purpose of reading old floppy disks - all but two of the floppies could be read and copied using this PC. On one disk I could view the list of files on it but couldn't copy all of them off the disk so I considered this to be a partial success. The one disk which I couldn't access at all was interestingly the one with no label. Perhaps this mystery disk was in fact never formatted or put into active use. 

Not too bad a result so far?

So what about the contents of the disks?

The contents of one of the floppy disks. Windows Explorer identifies the DOC files as
Microsoft Word 97-2003 but they are likely to be an earlier version of Word than this

As mentioned above the file and folder naming is noticeably brief (as is the way with media from this period). Today we talk to our researchers about the importance of naming files in such a way that you know what it is before you double click on it. This was near on impossible when faced with only 8 characters. I created this data but have no idea what I might expect to find in a directory called 'DISTEX' (though the label on the disk does help give a clue).

Note too the lack of organisation of the contents. At the end of my masters degree whilst finishing off my papers and publications I was also clearly focusing on what my next steps would be. Personal data (my CV for job applications) is stored alongside data relating to my research*. This again is something we discourage when we talk to researchers about data management. It is much easier when working with filestore to organise and categorise data more effectively, keeping personal data separate from research data. We have come a long way since the days when we were squashing any files that would fit on to a floppy disk regardless of content or context.

Here is some data on another of the disks (viewed in Windows Explorer as tiles). I have no idea what possessed me to store scanned photographs as GIF images. They look terrible! Did they always look this bad? Choosing the right file format is something we also cover in our RDM training and though file size is still a consideration for today's research students, at least they don't have to try and fit numerous images for one presentation on a single floppy disk.

More coded file names - this was a necessity when you had so few characters available.
I still remember what these mean but very much doubt anyone else would.


Some are my files are fairly easy to read, others less so (more detective work is required to find the right software). The Word documents are OK but come up in 'Protected View' (which means I'm not allowed to edit them). The default settings here are to treat a Word 6 or 95 document with suspicion but this can be easily resolved by editing these settings.

These old MS Word docs are still readable (and editable if I change the policy settings)

So, digging out my old research data has been an interesting diversion. I now use this as an example at the beginning of RDM teaching sessions and ask the students to imagine how their research data might look 20 years from now. 

Another added bonus from this exercise is that I now have even more files that I play with as I test Archivematica and file identification tools.




*Interesting to note that a first (unsuccessful) attempt to get a job in York occurred in 1998. I got here 5 years later!






Jenny Mitcham, Digital Archivist

Friday 5 February 2016

New "Filling the Digital Preservation Gap" report released

I am pleased to announce that we have just published a new report on the "Filling the Digital Preservation Gap" project.



Filling the Digital Preservation Gap. A Jisc Research Data Spring project. Phase Two report - February 2016 - Jenny Mitcham, Chris Awre, Julie Allinson, Richard Green, Simon Wilson. https://dx.doi.org/10.6084/m9.figshare.2073220.v1



This phase 2 report, funded through Jisc's Research Data Spring initiative, details the work the project team have carried out with Archivematica over the last few months of the project. 

Our phase 2 work had the following aims:
  • Work with Artefactual Systems to develop Archivematica in a number of areas in order to make the system more suitable for fitting into our infrastructures for research data management
  • Develop our own detailed implementation plans for Hull and York to establish how Archivematica will be incorporated into our local infrastructures for research data
  • Consider how Archivematica could work as an above campus installation
  • Continue to spread the word, both nationally and internationally, about the ongoing work of our project

Our work in all of these areas are detailed in the report in full. Please do download it and let us know what you think.

We very much hope that the new features we have sponsored within Archivematica will be of interest to other Archivematica users (both current and future) and that these features will continue to evolve and improve over time.



Jenny Mitcham, Digital Archivist

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...