source: gs3-extensions/maori-lang-detection

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33425   5 years ak19 A few more links now that I got past getting the vagrant VM with spark …
(edit) @33423   5 years ak19 Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
(edit) @33422   5 years ak19 Some more links.
(edit) @33419   5 years ak19 Last evening, I had found some links about how language-detection is …
(edit) @33414   5 years ak19 Adding important links
(edit) @33413   5 years ak19 Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
(edit) @33412   5 years ak19 config command for wgetting a single file
(edit) @33411   5 years ak19 Newer version now doesn't mirror sites with wget but gets WET files …
(edit) @33410   5 years ak19 Committing some variable name changes before I replace this file with …
(edit) @33409   5 years ak19 Forgot to commit 2 files with links and shuffling some links around …
(edit) @33408   5 years ak19 Some rough notes. Will move into appropriate file later.
(edit) @33407   5 years ak19 gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
(edit) @33405   5 years ak19 Even though we're probably not going to use this code after all, will …
(edit) @33404   5 years ak19 1. Links to other Java ways of extracting text from web content. 2. …
(edit) @33402   5 years ak19 Beginnings of the Java class to wget sites and process its pages to …
(edit) @33401   5 years ak19 MaoriTextDetector.class file now generated inside its package folder …
(edit) @33400   5 years ak19 1. Setting up log4j.properties based on the macronizer's basic one …
(edit) @33399   5 years ak19 Putting properties files into the conf folder and keeping the lib …
(edit) @33398   5 years ak19 Committing the actual package structure and the updated README after …
(edit) @33397   5 years ak19 1. Changing package structure and instructions on compiling/running as …
(edit) @33394   5 years ak19 1. Started a file on feasibility with the data now available and some …
(edit) @33393   5 years ak19 Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …
(edit) @33391   5 years ak19 Some rough bash scripting lines that work but aren't complete.
(edit) @33390   5 years ak19 Minor message telling the user to wait for a task that takes some time.
(edit) @33379   5 years ak19 New script to automate getting a file listing of the common crawl URL …
(edit) @33378   5 years ak19 New bin/script folder and relocating gen_SentenceDetection_model.sh to …
(edit) @33377   5 years ak19 Changes to get gen_SentenceDetection_model.sh to run still from the …
(edit) @33376   5 years ak19 Links and extracts I've read so far on the Web Curator Tool (WCT), …
(edit) @33358   5 years ak19 More minor changes to README
(edit) @33357   5 years ak19 Minor changes
(edit) @33356   5 years ak19 Updating script. Correction to a filepath different in the svn folder …
(edit) @33355   5 years ak19 Changes for adding in the new gen_SentenceDetection_model.sh script, …
(edit) @33350   5 years ak19 Better comments. Tested macronised vs unmacronised Māori language test …
(edit) @33339   5 years ak19 Updated README.
(edit) @33338   5 years ak19 1.After renaming the java class, changed all occurrences of the old …
(edit) @33337   5 years ak19 Renaming the class to MaoriTextDetector, since it doesn't detect audio …
(edit) @33336   5 years ak19 Major rewrite to make this class more useful to callers. …
(add) @33335   5 years ak19 First java file for Māori language detection using openNLP with the …
Note: See TracRevisionLog for help on using the revision log.