Ignore:
Timestamp:
2019-09-16T19:45:01+12:00 (5 years ago)
Author:
ak19
Message:

Much harder to remove pages where words are fused together as some are shorter than valid word-lengths of 15 chars, some are long, when the number of valid words still come to more than the required number of 20. The next solution was to ignore pages that had more than 2 instances of camelcase, but valid pages (actual Maori language pages) may end up with a few more camelcased words if navigation items get fused together. Not sure what to do.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/conf/config.properties

    r33467 r33480  
    1515WETprocessor.min.content.length.wrapped.line=500
    1616WETprocessor.min.spaces.per.wrapped.line=10
     17
     18# Arbitrary cutoff values for WETProcessor.java
     19# for determining whether a WET record has sufficient and sensible content
     20WETprocessor.max.word.length=15
     21WETprocessor.min.num.words=20
     22WETprocessor.max.words.camelcase=10
Note: See TracChangeset for help on using the changeset viewer.