Ignore:
Timestamp:
2019-08-13T15:09:28+12:00 (5 years ago)
Author:
ak19
Message:

Some rough notes. Will move into appropriate file later.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/other.txt

    r33404 r33408  
    1919
    2020https://gist.github.com/svemir/4207353
     21(Hadoop related) A Common Crawl Experiment
    2122
    2223https://gist.github.com/Smerity/afe7430fdb4371015466
     
    3233
    3334https://dmorgan.info/posts/common-crawl-python/
     35https://groups.google.com/forum/#!topic/common-crawl/pdI3w09AAbQ
     36
     37Example:
     38WARC:
     39tikauka:[142]/Scratch/anupama/maori-lang-detection>wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-30/segments/1563195526237.47/crawldiagnostics/CC-MAIN-20190719115720-20190719141720-00077.warc.gz
     40WET:
     41tikauka:[142]/Scratch/anupama/maori-lang-detection>wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2019-30/segments/1563195526237.47/wet/CC-MAIN-20190719115720-20190719141720-00508.warc.wet.gz
     42tikauka:[142]/Scratch/anupama/maori-lang-detection>gunzip CC-MAIN-20190719115720-20190719141720-00508.warc.wet.gz
     43
Note: See TracChangeset for help on using the changeset viewer.