Skip to main content

COVID-19 update: Fisher, Law, The Quarter*, Camden Library spaces are open to current students & staff only with no in-person services. Check current hours.
Library staff are available on Live Chat Mon–Thu: 8am–10pm, Fri: 8am-8pm Sat-Sun: 10am–2pm. Learn more.

Archive digital content

Ensure that your content is safe in the long term

Archiving digital content that is no longer actively used by you or your team will ensure that it is stored securely to prevent data loss in the long term. Proper archiving will ensure continued access for you, or for others if you choose to publish or share your digital content.

Archiving options for research data
Organise your files
  • Organising your files well helps to ensure that they remain findable and understandable for you and for other users. Using meaningful file names and a strategically organised folder structure throughout your project are useful techniques for organising data and other digital content to ensure that it will be easy to archive at the end of a project. If your files don’t have meaningful files names and aren’t well organised, you should do this before archiving anything.


    File naming

    Select an appropriate naming convention for your files as early as possible and follow it consistently throughout your research. Make sure your naming convention is documented, for instance, in a README file, before archiving your content.

    • You may wish to start your title with the date, formatted as YYYYMMDD, to display your files in chronological order
    • Choose useful keywords that you or others might use to search for your files, separating each word or section with a hyphen or underscore. Document the keywords you choose to use, so that you can interpret your file names later. Useful keywords may include:
      • project acronym
      • location
      • data type
      • data collection methods

      Example: [Date]_[Project]_[Location]_[Method]_[Run]

    Things to avoid

    • Don’t manually change or delete the file extension suffix (e.g. .docx, .pdf, .csv) which is usually generated automatically
    • Avoid the use of special characters such as \ / : * ? " < > |, apart from hyphens and underscores, in file names
    • Don’t make file names too long

    Folder structure

    A well organised folder structure can save you time and will help other future users to understand and find your digital content.


    Key considerations:

    • Keep any raw data in a separate folder from working data
    • Store any consent forms separately for ethics and privacy reasons
    • Nest your folders in the direction that best suits to how you plan to use them, e.g. Location > Method > Date or Method > Date > Location
    • Don’t create too many empty folders ahead of time

    Example: [Project] > [Experiment] > [Instrument or Type of file] > [Location]

    Further information:

Describe your content
  • Before archiving digital content, it’s essential to ensure that your content is accurately documented and described. This will ensure that you and any future users can make sense of it and understand the processes that have been followed in the creation of the content, for example, during data collection, processing and analysis. If you are archiving to a location that allows for access by or sharing with other users, then well-described content will be more easily discoverable and reusable by others.


    Metadata

    Metadata is descriptive and contextual information about your content. It may include title, creator(s), date produced/ collected, location, abstract, subject, method, process, quality, format, rights, and ownership.

    To determine what metadata to keep, it’s useful to think about what information would help someone else understand and reuse your digital content. You may consider using a metadata standard (also called a metadata schema), a defined set of fields that can either be general or discipline-specific. Using a standard will not only provide a rich description of your content, but also increases the likelihood of people finding it.

    If you’re not sure where to start, check out Dublin Core, which is a commonly used general metadata standard, or find a metadata standard related to your discipline by searching the Digital Curation Centre’s disciplinary metadata directory.


    Creating documentation

    Once you’ve decided what metadata you need to collect and keep, you should record this information and store it with the digital content. Some storage systems, like the University’s eNotebook, provide mechanisms for you to do this when you save your content. In other storage systems, like the Research Data Store and CloudStor, you may have to record your metadata manually in a README document (a text document) or a version control table.

    Further information:

File formats
  • When preserving and publishing digital content it’s essential that you save your work in an appropriate file format to ensure long-term accessibility. The file formats you use when working with your data may not be appropriate for archiving or publishing purposes. You should think about capturing data or converting files into formats that are:

    • widely used within your discipline
    • publicly documented, i.e. the complete file specification is publicly available
    • open and non-proprietary
    • endorsed by standards agencies such as the International Organisation for Standardization (ISO)
    • self-documenting, i.e. the file itself can include useful metadata
    • unencrypted
    • uncompressed or that use lossless compression

    For example:

    • Quantitative research data
      • While you collect and analyse your data, you might need it in a number of different formats: an Excel spreadsheet, a database, an SPSS, SAS, R, MATLAB or other file format native to the specific data analysis software you are using.
      • Once your analysis is complete, save the data as a comma separated values (.csv) file for long-term storage. Most data software packages provide options for saving as a comma separated values file. This format is portable across different computing and software platforms and is therefore more resilient to software updates.
    • Image files
      • The uncompressed TIFF (Tagged Image File Format) is a good choice for long-term preservation of image files. Most image creation software packages provide options for saving images as TIFF files. You should save your image files in this format right from the outset, so that you capture the highest possible quality master image files.
      • While working with your images, you may need to manipulate, share, or embed them in other documents. For these purposes it may be useful to compress your image files into JPEG format so that they’re smaller and easier to send over the internet or embed in analysis project files.

    The following table provides general suggestions for suitable file format choices for long-term preservation of digital conent. For more specific recommendations, please contact library.digitalcollections@sydney.edu.au. We can provide advice on which file formats to use for long-term preservation, for sharing with collaborators and for access copies of your content, as well as when and how to convert to these formats.

    • Archive
      Preservation Format(s)
      ZIP File Format (.zip)
    • Audio
      Preservation Format(s)
      PCM Wave Format (.wav) – minimum 16bit/44.1kHz
      Broadcast Wave Format (.bwf) - minimum 24bit/48kHz
    • Images
      Preservation Format(s)
      Tagged Image File Format (.tif, .tiff)
    • Tabular Datasets
      Preservation Format(s)
      Comma Separated Values (.csv)
      Microsoft Excel (.xlsx)
    • Text
      Preservation Format(s)
      Plain Text (UTF-8) (.txt)
      PDF/A (.pdf)
      PDF/A-3 (.pdf)
    • Video
      Preservation Format(s)
      Audio Video Interleave (.avi) – uncompressed
      MPEG-4 (.mp4) - CODEC: ProRes, H.264, AVC, audio: stereo AAC
      MOV (.mov) - CODEC: ProRes, H.264, AVC, audio: stereo AAC
      JPEG2000 OP1a MXF (.mxf)
      FFV1 Matroska (.mkv)

    Further information

Retention periods