- Timestamp:
- 2019-10-14T21:04:58+13:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt
r33558 r33565 293 293 --- 294 294 295 ---------------------------------------------------------------------- 296 Testing URLFilters: testing a URL to see if it's accepted 297 ---------------------------------------------------------------------- 298 Use the command 299 ./bin/nutch org.apache.nutch.net.URLFilterChecker -allCombined 300 (mentioned at https://lucene.472066.n3.nabble.com/Correct-syntax-for-regex-urlfilter-txt-trying-to-exclude-single-path-results-td3600376.html) 301 302 Use as follows: 303 304 cd apache-nutch-2.3.1/runtime/local 305 306 ./bin/nutch org.apache.nutch.net.URLFilterChecker -allCombined 307 308 Then paste the URL you want to test, press Enter. 309 A + in front of response means accepted 310 A - in front of response means rejected. 311 Can continue pasting URLs to test against filters until you send Ctrl-D to terminate input. 312 313 314 315 316
Note:
See TracChangeset
for help on using the changeset viewer.