|
|
@33425
|
5 years |
ak19 |
A few more links now that I got past getting the vagrant VM with spark …
|
|
|
@33423
|
5 years |
ak19 |
Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
|
|
|
@33422
|
5 years |
ak19 |
Some more links.
|
|
|
@33419
|
5 years |
ak19 |
Last evening, I had found some links about how language-detection is …
|
|
|
@33414
|
5 years |
ak19 |
Adding important links
|
|
|
@33413
|
5 years |
ak19 |
Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
|
|
|
@33412
|
5 years |
ak19 |
config command for wgetting a single file
|
|
|
@33411
|
5 years |
ak19 |
Newer version now doesn't mirror sites with wget but gets WET files …
|
|
|
@33410
|
5 years |
ak19 |
Committing some variable name changes before I replace this file with …
|
|
|
@33409
|
5 years |
ak19 |
Forgot to commit 2 files with links and shuffling some links around …
|
|
|
@33408
|
5 years |
ak19 |
Some rough notes. Will move into appropriate file later.
|
|
|
@33407
|
5 years |
ak19 |
gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
|
|
|
@33405
|
5 years |
ak19 |
Even though we're probably not going to use this code after all, will …
|
|
|
@33404
|
5 years |
ak19 |
1. Links to other Java ways of extracting text from web content. 2. …
|
|
|
@33402
|
5 years |
ak19 |
Beginnings of the Java class to wget sites and process its pages to …
|
|
|
@33401
|
5 years |
ak19 |
MaoriTextDetector.class file now generated inside its package folder …
|
|
|
@33400
|
5 years |
ak19 |
1. Setting up log4j.properties based on the macronizer's basic one …
|
|
|
@33399
|
5 years |
ak19 |
Putting properties files into the conf folder and keeping the lib …
|
|
|
@33398
|
5 years |
ak19 |
Committing the actual package structure and the updated README after …
|
|
|
@33397
|
5 years |
ak19 |
1. Changing package structure and instructions on compiling/running as …
|
|
|
@33394
|
5 years |
ak19 |
1. Started a file on feasibility with the data now available and some …
|
|
|
@33393
|
5 years |
ak19 |
Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …
|
|
|
@33391
|
5 years |
ak19 |
Some rough bash scripting lines that work but aren't complete.
|
|
|
@33390
|
5 years |
ak19 |
Minor message telling the user to wait for a task that takes some time.
|
|
|
@33379
|
5 years |
ak19 |
New script to automate getting a file listing of the common crawl URL …
|
|
|
@33378
|
5 years |
ak19 |
New bin/script folder and relocating gen_SentenceDetection_model.sh to …
|
|
|
@33377
|
5 years |
ak19 |
Changes to get gen_SentenceDetection_model.sh to run still from the …
|
|
|
@33376
|
5 years |
ak19 |
Links and extracts I've read so far on the Web Curator Tool (WCT), …
|
|
|
@33358
|
5 years |
ak19 |
More minor changes to README
|
|
|
@33357
|
5 years |
ak19 |
Minor changes
|
|
|
@33356
|
5 years |
ak19 |
Updating script. Correction to a filepath different in the svn folder …
|
|
|
@33355
|
5 years |
ak19 |
Changes for adding in the new gen_SentenceDetection_model.sh script, …
|
|
|
@33350
|
5 years |
ak19 |
Better comments. Tested macronised vs unmacronised Māori language test …
|
|
|
@33339
|
5 years |
ak19 |
Updated README.
|
|
|
@33338
|
5 years |
ak19 |
1.After renaming the java class, changed all occurrences of the old …
|
|
|
@33337
|
5 years |
ak19 |
Renaming the class to MaoriTextDetector, since it doesn't detect audio …
|
|
|
@33336
|
5 years |
ak19 |
Major rewrite to make this class more useful to callers. …
|
|
|
@33335
|
5 years |
ak19 |
First java file for Māori language detection using openNLP with the …
|