Links and extracts I've read so far on the Web Curator Tool (WCT), Heritrix, CommonCrawl and the related WebDataCommons.