Introducing 2 new Java files still being written and untested. NutchTextDumpProcessor which uses TextDumpPage to parse the text dump in dump.txt of each site crawled by nutch.