Years of rapid digitisation across South Australian institutions left a sprawling mess of replicated image files; here's the chain of decisions that created the problem and the push to clean it up.
South Australian cultural and government institutions are midway through a years-long effort to purge hundreds of thousands of duplicate image files from public digital archives — a problem that traces back to a digitisation boom that began in earnest around 2015 and accelerated sharply during the 2020 COVID lockdowns. The scale of the redundancy has quietly inflated storage costs, slowed public search tools, and complicated the metadata work that underpins projects from the State Library of South Australia on North Terrace to the digital components of the Lot Fourteen innovation precinct on the old Royal Adelaide Hospital site.
The issue matters now because several of those institutions are rebuilding or migrating their content management systems ahead of new federal interoperability requirements that take effect in mid-2027. Cleaning the archives before migration is widely understood to be cheaper than cleaning them after. An unchecked duplication rate also distorts the discovery algorithms used by public-facing portals, meaning a researcher searching for, say, a specific 1970s photograph of Port Adelaide's Birkenhead Bridge can be returned dozens of near-identical files with slightly different file names and competing metadata tags.
How the Duplicates Accumulated
The root cause is straightforward: different teams, working at different times and often under separate funding streams, digitised the same physical collection items without checking what already existed. The State Records of South Australia, based at Gepps Cross, and the History Trust of South Australia, which operates the Migration Museum on Kintore Avenue and the South Australian Maritime Museum at Port Adelaide, both received separate state grants for digitisation projects between 2017 and 2022. Coordination was limited. When a physical photograph or document existed in more than one collection — as archival items frequently do — it was digitised multiple times, sometimes by different contractors using different resolution standards and file formats.
The problem compounded when institutions uploaded assets to shared platforms or exchanged files informally during the pandemic. Remote work in 2020 and 2021 meant staff were pulling files from network drives, re-scanning from home flatbed scanners, and re-uploading without the deduplication checks that would normally occur in a supervised digitisation lab. By 2023, internal audits at more than one institution had flagged that between 20 and 35 percent of image files in certain collections had at least one near-duplicate stored elsewhere in the same system, according to sector-wide estimates discussed at the 2024 Australian Society of Archivists conference in Adelaide.
The Path to a Fix
The practical response has taken two forms. The first is automated: hash-matching software, which generates a unique digital fingerprint for each image file and flags identical or near-identical copies, has been deployed across several SA collections since late 2024. The Lot Fourteen precinct, which hosts a cluster of data and tech companies alongside the Australian Space Agency, has become a testing ground for AI-assisted image deduplication tools developed by local startups. At least two companies operating from Lot Fourteen's co-working facilities have secured pilot contracts with SA Health and the Department for Education to run deduplication passes across administrative image libraries.
The second response is governance-driven. The State Government updated its Digital Records Management Policy in March 2025, requiring all agencies to document a deduplication workflow before commencing any new digitisation project. The change was partly a response to the findings of a cross-agency review commissioned by the Department of the Premier and Cabinet in 2023.
For institutions, the immediate priority is completing deduplication before system migrations begin. The State Library's Newspaper Digitisation Program, which has been running since 2007 and covers titles including The Register and The Advertiser from as far back as the 1840s, is one of the largest single collections requiring a clean-up pass. Librarians have until the end of 2026 to complete that work under the current project timeline. For anyone relying on those archives — historians, journalists, genealogists visiting the reading rooms on North Terrace — a cleaner, faster, more accurate search tool is the end result of what has been, behind the scenes, an unglamorous but necessary reckoning with a decade of digital sprawl.
Partner Content
Promoted
Brought to you by an Adelaide partner
Reach engaged Adelaide readers with sponsored stories
Tell your story in long form alongside trusted local journalism. Native placements run for seven days across the homepage and a dedicated article URL, with a clear “Promoted” label and full editorial production support.