Institutional repository?

During the project, we have come across the problem time and time again, of trying to define just what it is we are talking about when referring to an institutional repository. It turns out that we’ve failed to be consistent about how we define this, even in the context of our own project. The problem is magnified when asserting our definition against the 20 participating HEI in Scotland, all of whom have their own definition (and understanding) of what an institutional repository actually is.

I have been asked by the project team therefore, to try and come up with a working definition that we can apply to the project.

I’ve been looking at the definitions from a number of sources, and have even asked the question of the wider community on the JISCPM answers site but will start off here by looking at the definition contained on Wikipedia (as good a place as any to start on :-))

It states;

“An Institutional Repository is an online locus for collecting, preserving, and disseminating — in digital form — the intellectual output of an institution , particularly a research institution .

For a university , this would include materials such as research journal articles, before (preprints ) and after (postprints ) undergoing peer review , and digital versions of theses and dissertations , but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects .

The four main objectives for having an institutional repository are:

  • to create global visibility for an institution’s scholarly research;
  • to collect content in a single location;
  • to provide open access to institutional research output
  • to store and preserve other institutional digital assets, including unpublished or otherwise easily lost (“grey”) literature (e.g., theses or technical reports).”

And do I agree with this?

In the above definition, the Institutional Repository is defined as the ‘locus’ for outputs. I just don’t believe this to be the case. The Institutional repository should not, as a general rule be treated in the singular.

Its the same problem that I came across in my previous work at the National Library when trying to describe what a ‘Trusted Digital Repository’ was – i.e. that the ‘traditional’ Library is in effect a trusted print repository that had been developed over years and years and comprised many services, facilities and specialisms. Without fail, staff made the assumption that TDR was a thing that you bought or installed, rather than an objective to be met in the future following period of intense technological and cultural internal change.

The repository services that an institution provides bring together all the components that act as enablers for the institutions overall policies of collection, preservation and dissemination. This is the Institutional Repository.  In this repository landscape, the term institutional repository as is currently used  more truly describes a single element of these services, which is the provision of an enabling system that  could more readily be described as an ‘open access repository’ designed and operated on the basis of maximising its acquisition of full-text holdings of ‘formal’ research outputs and making them easily discoverable and openly accessible.

We are also unable to make the assumption that the institutional repository is the store of objects taking the definition above; ‘materials such as research journal articles, before (preprints) and after (post-prints) undergoing peer review , and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects’

Many institutions maintain separate systems to manage different object ‘types’ such as post-print materials or e-thesis, but for their own, independently valid reasons, often associated with workflows, or security and accessibility due to copyright/IPR issues. It is equally the case that some institutions choose to combine this material together in a single system, based on their own needs and requirements.

Definitions as used in the ERIS Project

The ERIS Project (Enhancing repository infrastructure in Scotland) has unfortunately generate a mixed definition of repositories.

Take the example of the first of the ERIS project objectives;

‘the project has the core objective of enhancing the level of researchers engagement with repositories, with a view to achieving a more sophisticated understanding of what repository functionality is needed.’

This is a good example of a problem that we have set ourselves.

For this objective, the project has embarked on the assumption that we are referring to the narrower definition of institutional repositories as singular systems designed to maximise the holdings of more ‘formal’ full text research outputs.

Note: what we find from our interaction with users is that one of the primary methods to enhance their levels of interaction will be by ensuring that there is a wide breadth of integrated services encompassing all repository ‘types’, of which the open access, full text repository is just one. The ability to link records and objects together from across this landscape of solutions and services is, and will become increasingly important.

The second of our objectives – ‘to enhance curation and preservation processes within institutions with a view to strengthening the credibility about the longevity of repositories amongst researchers’ takes the much broader definition of repositories as being part of an overall landscape supporting research‘.

This, as it happens, has always been the case on the project, but we have caused some confusion amongst the community by referring to the more assumptive definition used for the first objective. Hence the need to create a preservation policy framework, to allow the core policy to be adapted as required by the user and relative to their circumstances.

Recommendations to the project

So, to conclude. My recommendations to the project are that we shouldn’t use the term institutional repository as far as is possible when referring to the singluar repository set up.  As an ambiguous term,  Institutional Repository should refer to the the ‘repository landscape’ which makes up the research repository services provided by an institution overall, an approach which allows for individual scoping as required.

I will also recommend that for the currently assumed definition of institutional repository, we refer to the ‘open access repository’ designed and operated on the basis of maximising its acquisition of full-text holdings of ‘formal’ research outputs and making them easily discoverable and openly accessible. This will not be a one size fits all definition, but is explicit enough for the project to use when needed.

I will present this argument to the project delivery team at our next meeting (the 9th March) and let the discussion ensue. I’ll expect to update this post over the next couple of days as a result.


Fedora UK and Ireland User group meeting

I’ve just come back from Oxford where I’ve been presenting some early findings from the project to a group of Fedora users from across the EU (plus a few from the States and Australia). I was invited to come and speak to the meeting by organiser Chris Awre as part of the Duraspace initaive called the Scholars Workbench community which has been established to;

‘undertake its own scholarship in this area. It will gather existing information on how scholars use and generate information, and how they manage it, capturing experience that can be shared.’

I felt that the presentation I gave was a bit of a curve ball for the attendees, as there was no techncial content, but I felt it was a good opportunity to find out how much knowledge there was outside of the Scottish HE sector about the research pooling initatives in Scotland, and how successful they have been. I have been suprised in the past about how scant this knowledge is, and given the questions I answered during the lunch break after the presentation, it seems that this is the case.

The general thrust of the presentation, which can be found here on Slideshare, was to describe the context of the project, position reserach pools and then to speak a little about the intial observations that we’ve made.

These observations centered around drivers that they have for recording details of the reserach produced from within the pool for purposes of reporting, administration and measurement.  I also commented on the very strong sense of community that the pools have developed, and how this could be a very valuable asset from the point of view of gaining buy in from the members into depositing their published (and potentially unpublished) outputs to support collaboration and knowledge transfer.

So, overall, a good opportunity to provide some good background to a lot of well known folks in the repository world.

ERIS Mention in latest Ariadne

Matfen Hall, Northumberland Photo courtesy of Dominic Tate©, SHERPA/RSP, University of Nottingham.

The project got a nice mention in Stephanie Taylor of UKOLN’s write up in the October edition of Ariadne about the recent (and very enjoyable) Repository Support Project Summer School event, held at Matfen Hall in Northumberland in early September of this year.

Please have a read – the article can be found here

Article Title: “The RSP Goes “Back To School””
Author: Stephanie Taylor
Publication Date: 30-October-2009 Publication: Ariadne Issue 61
Originating URL:
Copyright and citation information File last modified: Tuesday, 24-Nov-2009 13:33:38 UTC

Some inital throughts around repositiories and research pools

One of the primary objectives of the project is to look at the specific needs of research pools and how they, as users, will need to engage with repositories. I thought it might be of use to put down some of our initial findings into a blog post.

We have now met with 6 research pools involved with the project, and the initial findings are very interesting. Its made us look outside of ‘normal’ repository thinking, and we’ve identified a number of potential crossovers into other JISC led projects, and also into EuroCRIS and its CERIF metadata format.

The first challenge with research pools is that they are made up of multiple institutions, yet effectively operate as independent entities, investing in resources (both human and facilities/equipment). To support their ability to report effectively to their paymasters, the funding councils and member institutions, they need to have specific policies in place to govern how their data should be managed and presented. This immediately causes a conflict between the pool and its member institutions, who all have their own policies and requirements. The project has to support both the institutions and the pools in this respect, so we in effect become a facilitator of needs.

The other interesting element is the direction of deposit for new materials when considering research pooling. Each institution has its own efforts to support the use of IR’s in managing their research outputs, and this is mostly a mediated activity which is not without its own challenges. We’ve spent a lot of time trying to find out how to motivate researchers to deposit at the local level, and the arguments to support the use of IR’s, but deposit (even post mandate deposit) is still very low. A lot of this would seem to come down to the identified benefits to the individual, as opposed to the benefits to the institution.

The research pools however are able to identify considerable benefits for their members in providing information on their research outputs – their opportunity for survival past the end of their initial investment periods, and as such, if the research pool’s ask, their members tend to deliver. At the moment the pool members produce publication lists and send them in to the pool administrators for collation and reporting. This is a very time intensive process, carried out independently of current IR efforts and tends to not consider the generation of a formal publications repository and associated OA/full text, plus all the benefits that go from increased exposure and availability.

So, we are now considering the following opportunity. For the ERIS Project to provide a mediation service for the pools, we would need to take the lists of their research outputs, compare them against the harvested outputs from all the existing institutional repositories and then gather and feed information back down into the IR’s from the central database. The central service would add the research pool items to a collection level description, acquire the full text versions of the outputs and then feed them back down to the individual institutions repositories. A sort of reverse logistics exercise. In this case, the central repository is king, and the pools would be able to easily generate the reports they need and smooth their ROI management with their funders.

In addition, the pools would like to be able to identify the research that has been produced as the result of capital investment in services and facilities. The project could also be able to help the pools in this respect by producing a CERIF-XML crosswalk from the core harvested data – which allows for the association of resources with research projects and facilities, in a common format that could also link up research outputs with the rest of Europe.

As well as all the above, the pools have also identified the need to create linked grey literature repositories to the final published outputs of their research, in order to build cross pool expertise and knowledge for areas such as impact planning – where some project have great exemplars for documentation and planning which have made projects a success, but wouldn’t necessarily be cited in a publication.

Anyhow, we start the formal assessment of our findings at the beginning of December, so expect some more thoughts from us around that time.


ERIS Project Survey of Curation and Preservation policies across Scotland’s HEI

The ERIS Project has today launched a survey designed to establish the current level of curation and preservation policies in place across the repository landscape in Scottish HEI.

The survey can be found at and we would be most grateful if you could find the time to complete it. It will remain open until 17:00 on 27th November.

The project has invited members of the Scottish Repository Managers group, the Scottish Consortium of University & Research Libraries (SCURL) and the Higher Education Information Directors (HEIDs) to complete the survey. We welcome multiple responses per institution.

To help respondents review the survey prior to completion, we have made a copy of the survey text available for printing/download via our Scribd account.

image from Flickr - attributed to 'Tall Chris'

I’m looking to set a common definition for ‘institutional repository’ in the context of the ERIS Project. Its been the cause of some confusion in a number of areas so I’m hoping that we can come up with something that does the job across the board.

For example, the beloved Wikipedia describes Institutional Repository as;

An Institutional Repository is an online locus for collecting, preserving, and disseminating — in digital form — the intellectual output of an institution, particularly a research institution.

For a university, this would include materials such as research journal articles, before (preprints) and after (postprints) undergoing peer review, and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects.’

Now, if we were to take this definition within the project, then we (potentially) are including thinking about learning objects, which I have always seen as being outside of the project scope. But not all institutions are equal, with each one having one, or many repositories with varying degrees of content etc.  For a project which deals with 20 HEI, just explaining what we are referring to when we talk about IR’s is causing a bit of a headache.

So, suggestions on a postcard if you can.  How have other projects dealt with this issue? Has it even been an issue??


SCURL Repository managers event

On the 24th September, we held a meeting of the SCURL Repository managers group in conjunction with the ERIS project at the National Library of Scotland in Edinburgh. There were 20 attendees from across Scotland – and we had a pretty good geographical spread. There were some obvious instutional gaps unfortunatly, but not everyone can be available at the same time!

There were a couple of different reasons for the meeting. The first was to get everyone together and to ‘re-launch’ the Scottish repository managers meetings, having had an initial meeting 18months ago, but nothing since.

Wearing my project hat though, we wanted to have the opportunity to get together with as many repository managers as possible and talk about the project in general, and more specifically so we could run through the preservation policy survey that has been put together by the team at the Digital Curation Centre (DCC) on behalf of work package 2, of which they are the leaders.

The survey is a key part of the first objective in this work package, which is looking to develop a recommended policy framework for digital curation policy, based on DCC tools.

We started the meeting off by talking about the value in the repository managers forum in Scotland, in particular the need for us to work together and be creative in finding solutions to repository problems. This was one of the key themes of the recent Repository Fringe Event  – written up here.

The attendees then heard from me, with a general overview of the ERIS project (see below) and its aims and objectives –  in which i’m looking to communicate the value of taking an infrastructure level view of repositories in Scotland, and to ask for help from those who are responsbile for repository work across the country.

We then heard from the DCC, who presented an overview of DCC services, aims and objetives, and about the phase 3 plans that they are now working towards. Presentation below;

Post coffee was the main event, where Martin Donnelly from DCC walked through the survey questions with the group, so we could elicit responses and ‘fine tune’ the questions.  We weren’t sure how this was going to go, but in the end we got some very encouraging feedback, albeit not necessarily in the areas that we expected.  The chief issues seemed to be around assumptions of what constituted repositories within the institutions, who manages or is responsible for them, and in how policies for preservation are set either by or on behalf of repositories.

An  awful lot to think about, but since then the team at DCC have been working on revising the survey to clarify a load of points and to make sure that the rationale behind the survey is clear to all those who are asked to respond. I was also pleased that those who attended wanted to see as much opportunity as  possible to write narrative to support their answers,  and whilst this can be tricky to analyse sometimes, it will add real weight to the final output.