Changeset 21876 for main/trunk
- Timestamp:
- 2010-04-13T15:29:42+12:00 (14 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/classify/Phind.pm
r20454 r21876 29 29 # The Phind clasifier plugin. 30 30 # Type "classinfo.pl Phind" at the command line for a summary. 31 32 # 12/05/02 Added usage datastructure - John Thompson33 31 34 32 package Phind; … … 472 470 } 473 471 474 if ($language_exp =~ / en/) {472 if ($language_exp =~ /^en$/) { 475 473 return &convert_gml_to_tokens_EN($text); 476 474 } … … 504 502 505 503 506 507 508 504 # 2. Split the remaining text into space-delimited tokens 509 505 … … 513 509 # Split text at word boundaries 514 510 s/\b/ /go; 515 511 516 512 # 3. Convert the remaining text to "clause format" 517 513 … … 521 517 522 518 # remove unnecessary punctuation and replace with clause break symbol (\n) 523 s/[^\w ]/\n/go; 519 # the following very nicely removes all non alphanumeric characters. too bad if you are not using english... 520 #s/[^\w ]/\n/go; 521 # replace punct with new lines - is this what we want?? 522 s/\s*[\?\;\:\!\,\.\"\[\]\{\}\(\)]\s*/\n/go; #" 523 # then remove other punct with space 524 s/[\'\`\\\_]/ /go; 524 525 525 526 # remove extraneous whitespace
Note:
See TracChangeset
for help on using the changeset viewer.