Python based code for analysing the quality of metadata provided by academic publishing sources such as CrossRef. Quick, rough guide: #---- 1. Download the CrossRef Dataset: #---- The CrossRef dataset is available via AcademicTorrents: https://academictorrents.com/details/d9e554f4f0c3047d9f49e448a7004f7aa1701b69 At the time of writing, the provided dataset page doesn't link forward to newer releases. To keep an eye for new releases, Google Search "Public Data File from Crossref" https://www.google.com/search?channel=fs&client=ubuntu&q=+Public+Data+File+from+Crossref+ Transmission is a Torrent client availabe on Ubuntu Linux machine in CMS, which can be used to download the JSON files. If looking looking for a non-admin way to run a Torrent client then it is possible to download Vuze for Linux from https://www.vuze.com/download.php (downloads as VuzeInstaller.tar.bz2) From that download, you can untar it, and then run ./vuze #---- 2. Setup a MongoDB Server #---- Get going with a mongodb server, for example: svn co https://svn.greenstone.org/gs3-extensions/mongodb/trunk mongodb Then follow the instruction in mongodb/README.txt -- Additional Notes: -- Studio 3T is a GUI client for MongoDB. It can be downloaded from: https://studio3t.com/ It can be run with a free open source license (formly Robo 3T), but since the move to Studio 3T the developers start you off in the trial Pro/Ultimate version, so there are a few hoops to jump through to get to the open source version. You need to sign-up for an account as part of the installation process (can use Google Sign-in, which simplifies things), and then back in the GUI you can change its configuration settings straightaway to be the Free 3T version. In the GUI, setup/open a connection for: mongodb:http://localhost:27017 #---- 3. Working with the Python code #---- Create your own Python virtual environment, for example: python3 -mvenv my-python3 source my-python3/bin/activate pip install wheel To get going with the Python code itself cd py pip install -r requirements.txt Developed by Joel Crombie (jc550) as a Summer Research Project (ALPSS373-23C) --------------------------------------------------