Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: main/trunk/greenstone2/build-src/packages/w3mir/w3mir-1.0.8/README@ 22694

Last change on this file since 22694 was 719, checked in by davidb, 25 years ago
added w3mir package
Property svn:keywords set to `Author Date Id Revision`
File size: 5.7 KB

Line
1	PROPAGANDA:
2
3	See http://www.math.uio.no/~janl/w3mir/ for propaganda.
4
5	--------------------------------------------------------------------------
6	Q: Where can I get a new version of w3mir?
7	A: The w3mir homepage is at http://www.math.uio.no/~janl/w3mir/.
8	W3mir is also distributed on CPAN: http://www.perl.com/
9
10	Q: Are there any mailing lists?
11	A: Yes, see below.
12
13	Q: Should I subscribe to any of the mailinglists?
14	A: Yes, if you use w3mir at all you should subscribe to
15	[email protected], send e-mail to [email protected] to be
16	subscribed.
17
18	Q: I found a bug!
19	A: See below.
20
21	Q: How do it...
22	A: There is a w3mir-HOWTO.html file in the distribution. Read it.
23
24	--------------------------------------------------------------------------
25	BUGS:
26
27	- -lc switch does not work too well.
28
29	Please see below for how to report bugs.
30
31	--------------------------------------------------------------------------
32	FEATURES (NOT bugs):
33
34	- URLs with two /es ('//') in the path component does not work
35	as some might expect. According to my reading of the http/url spec.
36	it is an illegal construct, which is a Good Thing, because I don't
37	know how to handle it if it's legal.
38	- If you start at http://foo/bar/ then index.html might be gotten twice.
39	- Some documents point to a point above the server root, i.e.,
40	http://some.server/../stuff.html. Netscape, and other browsers, in
41	defiance of the URL standard documents will change the URL to
42	http://some.server/stuff.html. W3mir will not.
43
44	--------------------------------------------------------------------------
45	MAIL LISTS, REPORTING BUGS:
46
47	Please send ideas (see todo lists further down please) and bug reports
48	to [email protected], please include URL and command line/config
49	file that triggered the bug. [email protected] is mainly used for
50	announcing new versions or bugs to w3mir users. You will get more
51	information following that list than anywhere else. And it's very low
52	volume. To subscribe to these lists email [email protected]. The
53	w3mir-core list is intended for w3mir hackers only.
54
55	--------------------------------------------------------------------------
56	COPYRIGTHS:
57
58	w3mir, w3http.pm, w3pdfuri.pm and htmlop.pm are free but it is
59	Copyrighted by the various involved hackers. If you want to copy,
60	hack or distribte w3mir you can do that providing you comply with the
61	'Artistic License' enclosed in the w3mir distribution in the file
62	named Artistic.
63
64	--------------------------------------------------------------------------
65	CREDITS:
66
67	- Oscar Nierstrasz: Wrote htget
68	- Gorm Haug Eriksen: Started w3mir on the foundations of htget,
69	contributed code later.
70	- Nicolai Langfeldt: Learning from Oscar and Gorms mistakes, rewrote
71	everything.
72	- Chris Szurgot: Adapting to win32, good ideas and code contribs,
73	Debugging. And criticism.
74	- Rik Faith: Uses w3mir extensively, not shy about complaining and
75	commenting and suggesting.
76	- The libwww-perl author(s) that made adding some new featres
77	ridicolously easy.
78
79	--------------------------------------------------------------------------
80	TODO LIST:
81
82	* TODO, version 1.1:
83
84	Some of these are speculative, some others are very useful.
85
86	- CSS parsing/support at the same level as HTML
87	- Full support for APPLETS/OBJECT tags.
88	- Alias rules. These would enable w3mir to map ukoln.bath.ac.uk and
89	bubl.bath.ac.uk to www.ukoln.ac.uk and know that the objects contained
90	in these are all the same. Another use would be to mirror from a mirror
91	instead of the _real_ site, since the original site to which you have
92	references are on a slow link while the mirror is on a fast link.
93	- FTP support (easy if through a http style ftp proxy, but is that what
94	we want?)
95	- SHTTP/SSL support
96	- Add a 3rd argument to URL/Also that denotes the depth of documents to
97	be retrived under each, this would autimatically generate fetch/ignore
98	rules. How well these work would be dependent on the order of the
99	URL/Also directives though. Must be documented.
100	- Retrive recursively until N-th order links are reached. This
101	differs siginificantly from directory recursion which we do now. When
102	this is done w3mir should also know the difference between inline and
103	external links, inline links should always be retrived. Trivia question:
104	What order is needed to reach every findable document on the web from
105	Yahoo?
106	- Integrate with cvs or rcs (or other version controll system) to make
107	retriver able to reproduce mirrored site for any given date.
108	- Some text processing: Adding and removing text/sgml comments when suitable
109	options and tags are found.
110	- Put retrival date-stamps as comments in html files, to document the when
111	and how of how this document was retrived.
112	- Example: If you're mirroring a site primarily to get to the papers, but
113	the site has n versions of each paper: foo.ps.gz, foo.ps.Z, foo.dvi.gz
114	foo.dvi.Z, foo.tar.gz, foo.zip and you only need one version. Implement
115	a way to get only one version of documents provided in multipele versions,
116	something like multi axis preference list to get only the most attractive
117	version of the doc.
118	- Logging of retrivals to file, need to change every print to a functioncall.
119	- Your suggestion here.
120
121	* TODO, http related
122	- Use Keep-alive. Then we should probably stop using 30 second pauses
123	between document retrivals.
124	- HTTP/1.1? HTTP/1.1 servers should do keep-alive even with 1.0 requests.
125	- Separate quenes for each server, interleave requests.
126	- Authentication is only done if challenged now. If a document in a
127	directory needs authentication it is very likely that all other documents
128	in that directory and subdirectories require authentication.
129
130	* If perl gets threads:
131	- Make the retrival and analysis engines separate threads, and have each
132	one retrival thread pr. method/server/port and do paralell retrivals.
133	- This makes http/1.1 easier too.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: