|
|
@33680
|
5 years |
davidb |
Greenstone3 is fixed, so don't need to print out message about runing …
|
|
|
@33679
|
5 years |
davidb |
Folder for working on updates (PDFs to del, PDFs to add) from Kiri
|
|
|
@33678
|
5 years |
davidb |
setup for greenstone ext
|
|
|
@33677
|
5 years |
davidb |
Intro text
|
|
|
@33676
|
5 years |
davidb |
Some initial work getting a plugin going that call's Alex's VirusTotal …
|
|
|
@33675
|
5 years |
ak19 |
Committing the newer query results (but from before today's …
|
|
|
@33674
|
5 years |
ak19 |
Changes to support the top 5 predicted langcodes and their confidence …
|
|
|
@33673
|
5 years |
ak19 |
Waikato Education Department's Science Activities and Maths Activities …
|
|
|
@33672
|
5 years |
kjdon |
modified slightly so that the error messages come from the dictionary …
|
|
|
@33671
|
5 years |
kjdon |
added a static getTextString method - currently this is in Action.java …
|
|
|
@33670
|
5 years |
kjdon |
added editEnabled att string
|
|
|
@33669
|
5 years |
kjdon |
removed an annoying debug message
|
|
|
@33668
|
5 years |
kjdon |
a few changes to debuginfo texts
|
|
|
@33667
|
5 years |
kjdon |
preProcess.xsl renamed to expand-gslib.xsl to better indicate what it does
|
|
|
@33666
|
5 years |
ak19 |
Having finished sending all the crawl data to mongodb 1. Recrawled the …
|
|
|
@33665
|
5 years |
davidb |
Fixed jar name
|
|
|
@33664
|
5 years |
davidb |
Initial version code for running VirusTotal API against files, CLI scripts
|
|
|
@33663
|
5 years |
davidb |
Changes after testing the scripts
|
|
|
@33662
|
5 years |
davidb |
Scripts to compile and run java code
|
|
|
@33661
|
5 years |
davidb |
Compiling needs to use Maven
|
|
|
@33660
|
5 years |
davidb |
For Java source code
|
|
|
@33659
|
5 years |
davidb |
Top-level folder for new extension based on TotalVirus API which scans …
|
|
|
@33658
|
5 years |
davidb |
Top-level folder for new extension based on TotalVirus API which scans …
|
|
|
@33657
|
5 years |
ak19 |
Some fixes after brief testing against 1/3 of the crawl. Restarted …
|
|
|
@33656
|
5 years |
ak19 |
Final minor changes before I start processing the crawls of node2.
|
|
|
@33655
|
5 years |
ak19 |
Minor change to print statement
|
|
|
@33654
|
5 years |
ak19 |
Removing jar file that wasn't used after all.
|
|
|
@33653
|
5 years |
ak19 |
1. As suggested by Dr Bainbridge, made the code changes to use Morphia …
|
|
|
@33652
|
5 years |
ak19 |
Introducing morphia subpackage
|
|
|
@33651
|
5 years |
ak19 |
1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor
|
|
|
@33650
|
5 years |
kjdon |
updated to match the new xsl file names; lots of variable renames to …
|
|
|
@33649
|
5 years |
kjdon |
renamed config_format and text_fragment_format to better represent …
|
|
|
@33648
|
5 years |
kjdon |
changed the debuginfo xsl and strings to match the new o=xxx debug options
|
|
|
@33647
|
5 years |
kjdon |
added/changed a few of the output values for debugging the transform
|
|
|
@33646
|
5 years |
ak19 |
Saving the mongodb queries and learning links that Dr Bainbridge found …
|
|
|
@33645
|
5 years |
ak19 |
Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences …
|
|
|
@33644
|
5 years |
ak19 |
Just committing the growing mongodb.txt file with links and …
|
|
|
@33643
|
5 years |
ak19 |
Brought the template log4j.properties.in back up to speed. I forgot it …
|
|
|
@33642
|
5 years |
ak19 |
Forgot to commit the java driver for mongodb when I committed the Java …
|
|
|
@33641
|
5 years |
kjdon |
commented out some debug statements
|
|
|
@33640
|
5 years |
kjdon |
oops, I must have 'tidied' up the file and then not compiled it to …
|
|
|
@33639
|
5 years |
kjdon |
need to select child nodes, otherwise the gsf:default node ends up in …
|
|
|
@33638
|
5 years |
kjdon |
gslib doesn't use xml-to-string.xsl. its only used by formatmanager, …
|
|
|
@33637
|
5 years |
kjdon |
we can now use gsf and gslib in layout files.
|
|
|
@33636
|
5 years |
kjdon |
include means the stylesheet gets added inline, import mea s it gets …
|
|
|
@33635
|
5 years |
ak19 |
Maori-language-detection doesn't use Greenstone 3 at present, it's not …
|
|
|
@33634
|
5 years |
ak19 |
Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which …
|
|
|
@33633
|
5 years |
ak19 |
1. TextLanguageDetector now has methods for collecting all sentences …
|
|
|
@33632
|
5 years |
kjdon |
overhaul of TransformingReceptionist. changed the order of inlining …
|
|
|
@33631
|
5 years |
kjdon |
added a bit more error reporting
|
|
|
@33630
|
5 years |
kjdon |
minor comment changes
|
|
|
@33629
|
5 years |
kjdon |
added methods using Parameter2 - for params with text node values
|
|
|
@33628
|
5 years |
kjdon |
not sure why documentNode was a gsf:template here. Can't be like that …
|
|
|
@33627
|
5 years |
kjdon |
removed unnecessary comments
|
|
|
@33626
|
5 years |
ak19 |
TODOs
|
|
|
@33625
|
5 years |
ak19 |
A file listing domains with seedurls containing /mi(/) that are …
|
|
|
@33624
|
5 years |
ak19 |
Some cleanup surrounding the now renamed function createSeedURLsFile, …
|
|
|
@33623
|
5 years |
ak19 |
1. Incorporated Dr Nichols earlier suggestion of storing page modified …
|
|
|
@33622
|
5 years |
ak19 |
File rename
|
|
|
@33621
|
5 years |
ak19 |
Comitting jotted down mongodb related instructions from what Dr …
|
|
|
@33620
|
5 years |
ak19 |
Final crawl, done on vagrant VM node6. Crawl site IDs 01407-01462.
|
|
|
@33619
|
5 years |
kjdon |
need to handle the case where a collection file (eg image) gets …
|
|
|
@33618
|
5 years |
ak19 |
Adding in the download URL
|
|
|
@33617
|
5 years |
ak19 |
Node5 is now full and here is the finished crawl (up to and including …
|
|
|
@33616
|
5 years |
ak19 |
Beginnings of Java class that is to interact with MongoDB. I don't yet …
|
|
|
@33615
|
5 years |
ak19 |
1. Worked out how to configure log4j to log both to console and …
|
|
|
@33614
|
5 years |
kjdon |
added a new line
|
|
|
@33613
|
5 years |
kjdon |
added allowdocumentediting and allowmapgpsediting options, plus also …
|
|
|
@33612
|
5 years |
kjdon |
work to do with params. add in default values to params if they are …
|
|
|
@33611
|
5 years |
kjdon |
added global setting to params - thesea re for params that are valid …
|
|
|
@33610
|
5 years |
kjdon |
USER_SESSION_CACHE_ATT moved to GSParams, as it is stored in session …
|
|
|
@33609
|
5 years |
ak19 |
The tar files containing the crawled sites data shouldn't be called …
|
|
|
@33608
|
5 years |
ak19 |
1. New script to export from HBase so that we could in theory reimport …
|
|
|
@33607
|
5 years |
ak19 |
Updated with the remaining successfully crawled sites on node4 before …
|
|
|
@33606
|
5 years |
ak19 |
1. Committing crawl data from node3 (2nd VM for nutch crawling). 2. …
|
|
|
@33605
|
5 years |
ak19 |
Node 4 VM still works, but committing first set of crawled sites on there
|
|
|
@33604
|
5 years |
ak19 |
1. Better output into possible-product-sites.txt including the …
|
|
|
@33603
|
5 years |
ak19 |
Incorporating Dr Nichols suggestion to help weed out product sites: if …
|
|
|
@33602
|
5 years |
ak19 |
1. The final csv file, mri-sentences.csv, is now written out. 2. Only …
|
|
|
@33601
|
5 years |
ak19 |
Creates the 2nd csv file, with info about webpages. At present stores …
|
|
|
@33600
|
5 years |
ak19 |
Work in progress of writing out CSV files. In future, may write the …
|
|
|
@33599
|
5 years |
ak19 |
First one-third sites crawled. Committing to SVN despite the tarred …
|
|
|
@33598
|
5 years |
ak19 |
More instructions on setting up Nutch now that I've remembered to …
|
|
|
@33597
|
5 years |
ak19 |
Committing active version of template file which has a newline at end …
|
|
|
@33596
|
5 years |
ak19 |
Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template …
|
|
|
@33595
|
5 years |
kjdon |
new displayBaskets template - to avoid replicating code in query and …
|
|
|
@33594
|
5 years |
kjdon |
call gslib:displayBasket instead of replicating the code here
|
|
|
@33593
|
5 years |
kjdon |
the test for facets should be facetList/facet/count, as the facets get …
|
|
|
@33592
|
5 years |
kjdon |
reindented the file
|
|
|
@33591
|
5 years |
kjdon |
added in some strings for 'this collection contains x documents and …
|
|
|
@33590
|
5 years |
kjdon |
added 'this colleciton contains X documents and was last build Y days …
|
|
|
@33589
|
5 years |
cpb16 |
final01. Need Map results still
|
|
|
@33588
|
5 years |
ak19 |
Committing the MRI sentence model that I'm actually using, the one in …
|
|
|
@33587
|
5 years |
ak19 |
1. Better stats reporting on crawled sites: not just if a page was in …
|
|
|
@33586
|
5 years |
ak19 |
Refactored MaoriTextDetector.java class into more general …
|
|
|
@33585
|
5 years |
ak19 |
Much simpler way of using sentence and language detection model to …
|
|
|
@33584
|
5 years |
ak19 |
Committing experimental version 2 using the sentence detector model, …
|
|
|
@33583
|
5 years |
ak19 |
Committing experimental version 1 using the sentence detector model, …
|
|
|
@33582
|
5 years |
ak19 |
NutchTextDumpProcessor prints each crawled site's stats: number of …
|
|
|
@33581
|
5 years |
ak19 |
Minor fix. Noticed when looking for work I did on MRI sentence detection
|
|
|