Last change
on this file since 33555 was 33555, checked in by ak19, 4 years ago |
Modified top sites list as Dr Bainbridge described: suffixes for the same resource (e.g. google.com, google.it) are all retained. I have removed prefixes however, e.g. translate.google.com is removed since google.com is already there. Used latest version of alexa, wiki top sites page, and the moz top 500 sites page Dr Bainbridge had found.
|
File size:
6.7 KB
|
Line | |
---|
1 | # top sites - base url forms
|
---|
2 |
|
---|
3 | # Contains alexa top sites (where only the first 50 were visible)
|
---|
4 | # Added further top sites from https://en.wikipedia.org/wiki/List_of_most_popular_websites
|
---|
5 | # Finally also added https://moz.com/top500 by downloading its CSV file and
|
---|
6 | # adding its URLs to the existing listing here from alexa/wiki.
|
---|
7 | # Then used LibreOffice's Calc spreadsheet software to sort alphabetically and remove duplicates.
|
---|
8 | # Then in Gedit, used regex search and replace to remove <subdomain>.<site>.ext variants, keeping
|
---|
9 | # just <site>.ext
|
---|
10 | # And finally, re-sorted the reduced list alphabetically and pasted into here.
|
---|
11 |
|
---|
12 |
|
---|
13 | 000webhost.com
|
---|
14 | 360.cn
|
---|
15 | 4shared.com
|
---|
16 | a8.net
|
---|
17 | abc.es
|
---|
18 | abc.net.au
|
---|
19 | abcnews.go.com
|
---|
20 | about.com
|
---|
21 | about.me
|
---|
22 | aboutads.info
|
---|
23 | abril.com.br
|
---|
24 | academia.edu
|
---|
25 | accuweather.com
|
---|
26 | addthis.com
|
---|
27 | addtoany.com
|
---|
28 | adobe.com
|
---|
29 | adweek.com
|
---|
30 | airbnb.com
|
---|
31 | akamaihd.net
|
---|
32 | alexa.com
|
---|
33 | alibaba.com
|
---|
34 | aliexpress.com
|
---|
35 | alipay.com
|
---|
36 | aljazeera.com
|
---|
37 | allaboutcookies.org
|
---|
38 | allrecipes.com
|
---|
39 | amazon.ca
|
---|
40 | amazon.co.jp
|
---|
41 | amazon.co.uk
|
---|
42 | amazon.com
|
---|
43 | amazon.de
|
---|
44 | amazon.es
|
---|
45 | amazon.fr
|
---|
46 | amazon.in
|
---|
47 | ameblo.jp
|
---|
48 | ampproject.org
|
---|
49 | android.com
|
---|
50 | aol.com
|
---|
51 | ap.org
|
---|
52 | apache.org
|
---|
53 | apachefriends.org
|
---|
54 | apple.com
|
---|
55 | archive.org
|
---|
56 | archives.gov
|
---|
57 | arstechnica.com
|
---|
58 | arxiv.org
|
---|
59 | asahi.com
|
---|
60 | ask.fm
|
---|
61 | asus.com
|
---|
62 | axs.com
|
---|
63 | babytree.com
|
---|
64 | baidu.com
|
---|
65 | bandcamp.com
|
---|
66 | bbc.co.uk
|
---|
67 | bbc.com
|
---|
68 | behance.net
|
---|
69 | berkeley.edu
|
---|
70 | biblegateway.com
|
---|
71 | biglobe.ne.jp
|
---|
72 | billboard.com
|
---|
73 | bing.com
|
---|
74 | bit.ly
|
---|
75 | bitly.com
|
---|
76 | blackberry.com
|
---|
77 | blogger.com
|
---|
78 | blogspot.com
|
---|
79 | bloomberg.com
|
---|
80 | booking.com
|
---|
81 | boston.com
|
---|
82 | box.com
|
---|
83 | britannica.com
|
---|
84 | bt.com
|
---|
85 | bund.de
|
---|
86 | businessinsider.com
|
---|
87 | businesswire.com
|
---|
88 | buydomains.com
|
---|
89 | buzzfeed.com
|
---|
90 | ca.gov
|
---|
91 | cambridge.org
|
---|
92 | canalblog.com
|
---|
93 | cbc.ca
|
---|
94 | cbslocal.com
|
---|
95 | cbsnews.com
|
---|
96 | cdc.gov
|
---|
97 | change.org
|
---|
98 | channel4.com
|
---|
99 | chicagotribune.com
|
---|
100 | chinadaily.com.cn
|
---|
101 | cisco.com
|
---|
102 | clickbank.net
|
---|
103 | cloudflare.com
|
---|
104 | cmu.edu
|
---|
105 | cnbc.com
|
---|
106 | cnet.com
|
---|
107 | cnn.com
|
---|
108 | cocolog-nifty.com
|
---|
109 | columbia.edu
|
---|
110 | connect.over-blog.com
|
---|
111 | cornell.edu
|
---|
112 | corriere.it
|
---|
113 | cpanel.com
|
---|
114 | cpanel.net
|
---|
115 | creativecommons.org
|
---|
116 | csdn.net
|
---|
117 | csmonitor.com
|
---|
118 | dailymail.co.uk
|
---|
119 | dailymotion.com
|
---|
120 | dan.com
|
---|
121 | daum.net
|
---|
122 | debian.org
|
---|
123 | dell.com
|
---|
124 | depositfiles.com
|
---|
125 | detik.com
|
---|
126 | digg.com
|
---|
127 | discovery.com
|
---|
128 | disney.com
|
---|
129 | disney.go.com
|
---|
130 | disqus.com
|
---|
131 | doubleclick.net
|
---|
132 | dreniq.com
|
---|
133 | dribbble.com
|
---|
134 | dropbox.com
|
---|
135 | dropboxusercontent.com
|
---|
136 | dw.com
|
---|
137 | e-recht24.de
|
---|
138 | ea.com
|
---|
139 | ebay.co.uk
|
---|
140 | ebay.com
|
---|
141 | economist.com
|
---|
142 | eff.org
|
---|
143 | ehow.com
|
---|
144 | elmundo.es
|
---|
145 | elpais.com
|
---|
146 | engadget.com
|
---|
147 | entrepreneur.com
|
---|
148 | eonline.com
|
---|
149 | espn.com
|
---|
150 | espn.go.com
|
---|
151 | etsy.com
|
---|
152 | europa.eu
|
---|
153 | eventbrite.com
|
---|
154 | example.com
|
---|
155 | excite.co.jp
|
---|
156 | express.co.uk
|
---|
157 | facebook.com
|
---|
158 | fandom.com
|
---|
159 | fastcompany.com
|
---|
160 | fb.com
|
---|
161 | fb.me
|
---|
162 | fda.gov
|
---|
163 | fedoraproject.org
|
---|
164 | feedburner.com
|
---|
165 | fifa.com
|
---|
166 | files.wordpress.com
|
---|
167 | flickr.com
|
---|
168 | forbes.com
|
---|
169 | fortune.com
|
---|
170 | foursquare.com
|
---|
171 | foxnews.com
|
---|
172 | ft.com
|
---|
173 | ftc.gov
|
---|
174 | gen.xyz
|
---|
175 | geocities.jp
|
---|
176 | gesetze-im-internet.de
|
---|
177 | ggpht.com
|
---|
178 | github.com
|
---|
179 | gizmodo.com
|
---|
180 | globo.com
|
---|
181 | gmail.com
|
---|
182 | gnu.org
|
---|
183 | godaddy.com
|
---|
184 | gofundme.com
|
---|
185 | goo.gl
|
---|
186 | goo.ne.jp
|
---|
187 | goodreads.com
|
---|
188 | google.ca
|
---|
189 | google.co.id
|
---|
190 | google.co.in
|
---|
191 | google.co.jp
|
---|
192 | google.co.uk
|
---|
193 | google.com
|
---|
194 | google.com.br
|
---|
195 | google.com.hk
|
---|
196 | google.com.tr
|
---|
197 | google.de
|
---|
198 | google.es
|
---|
199 | google.fr
|
---|
200 | google.it
|
---|
201 | google.nl
|
---|
202 | google.pl
|
---|
203 | google.ru
|
---|
204 | googleapis.com
|
---|
205 | googleblog.com
|
---|
206 | googleusercontent.com
|
---|
207 | gooyaabitemplates.com
|
---|
208 | gov.uk
|
---|
209 | gravatar.com
|
---|
210 | greenpeace.org
|
---|
211 | gstatic.com
|
---|
212 | guardian.co.uk
|
---|
213 | harvard.edu
|
---|
214 | hatena.ne.jp
|
---|
215 | histats.com
|
---|
216 | hm.com
|
---|
217 | hollywoodreporter.com
|
---|
218 | home.pl
|
---|
219 | house.gov
|
---|
220 | howstuffworks.com
|
---|
221 | hp.com
|
---|
222 | huffingtonpost.com
|
---|
223 | huffpost.com
|
---|
224 | hugedomains.com
|
---|
225 | ibm.com
|
---|
226 | ibtimes.com
|
---|
227 | icann.org
|
---|
228 | ieee.org
|
---|
229 | ietf.org
|
---|
230 | ig.com.br
|
---|
231 | ign.com
|
---|
232 | ikea.com
|
---|
233 | imageshack.us
|
---|
234 | imdb.com
|
---|
235 | imgur.com
|
---|
236 | inc.com
|
---|
237 | independent.co.uk
|
---|
238 | indiatimes.com
|
---|
239 | indiegogo.com
|
---|
240 | instagram.com
|
---|
241 | instructables.com
|
---|
242 | intel.com
|
---|
243 | interia.pl
|
---|
244 | issuu.com
|
---|
245 | istockphoto.com
|
---|
246 | iubenda.com
|
---|
247 | jd.com
|
---|
248 | joomla.org
|
---|
249 | jquery.com
|
---|
250 | jstor.org
|
---|
251 | kickstarter.com
|
---|
252 | kinja.com
|
---|
253 | last.fm
|
---|
254 | latimes.com
|
---|
255 | lefigaro.fr
|
---|
256 | lemonde.fr
|
---|
257 | line.me
|
---|
258 | linkedin.com
|
---|
259 | list-manage.com
|
---|
260 | live.com
|
---|
261 | livejournal.com
|
---|
262 | livescience.com
|
---|
263 | loc.gov
|
---|
264 | lonelyplanet.com
|
---|
265 | lycos.com
|
---|
266 | m.wikipedia.org
|
---|
267 | mail.ru
|
---|
268 | marketwatch.com
|
---|
269 | marriott.com
|
---|
270 | mashable.com
|
---|
271 | mediafire.com
|
---|
272 | medium.com
|
---|
273 | mega.nz
|
---|
274 | megaupload.com
|
---|
275 | mercurynews.com
|
---|
276 | merriam-webster.com
|
---|
277 | metro.co.uk
|
---|
278 | microsoft.com
|
---|
279 | microsoftonline.com
|
---|
280 | mirror.co.uk
|
---|
281 | mit.edu
|
---|
282 | mixcloud.com
|
---|
283 | mlb.com
|
---|
284 | mozilla.com
|
---|
285 | mozilla.org
|
---|
286 | msn.com
|
---|
287 | myspace.com
|
---|
288 | mysql.com
|
---|
289 | namecheap.com
|
---|
290 | narod.ru
|
---|
291 | nasa.gov
|
---|
292 | nationalgeographic.com
|
---|
293 | nature.com
|
---|
294 | naver.com
|
---|
295 | naver.jp
|
---|
296 | nba.com
|
---|
297 | nbcnews.com
|
---|
298 | ndtv.com
|
---|
299 | netflix.com
|
---|
300 | netsons.com
|
---|
301 | netvibes.com
|
---|
302 | networkadvertising.org
|
---|
303 | news.com.au
|
---|
304 | newscientist.com
|
---|
305 | newsweek.com
|
---|
306 | newyorker.com
|
---|
307 | nginx.com
|
---|
308 | nginx.org
|
---|
309 | nhk.or.jp
|
---|
310 | nicovideo.jp
|
---|
311 | nifty.com
|
---|
312 | nih.gov
|
---|
313 | nikkei.com
|
---|
314 | noaa.gov
|
---|
315 | nokia.com
|
---|
316 | npr.org
|
---|
317 | nvidia.com
|
---|
318 | nydailynews.com
|
---|
319 | nypost.com
|
---|
320 | nytimes.com
|
---|
321 | nyu.edu
|
---|
322 | odnoklassniki.ru
|
---|
323 | office.com
|
---|
324 | offset.com
|
---|
325 | ok.ru
|
---|
326 | okezone.com
|
---|
327 | opera.com
|
---|
328 | oracle.com
|
---|
329 | orange.fr
|
---|
330 | oreilly.com
|
---|
331 | oup.com
|
---|
332 | over-blog.com
|
---|
333 | ovh.co.uk
|
---|
334 | ovh.com
|
---|
335 | ovh.net
|
---|
336 | ox.ac.uk
|
---|
337 | parallels.com
|
---|
338 | pastebin.com
|
---|
339 | paypal.com
|
---|
340 | pbs.org
|
---|
341 | pcmag.com
|
---|
342 | people.com
|
---|
343 | photobucket.com
|
---|
344 | php.net
|
---|
345 | pinterest.com
|
---|
346 | pixabay.com
|
---|
347 | playstation.com
|
---|
348 | plesk.com
|
---|
349 | plos.org
|
---|
350 | politico.com
|
---|
351 | prestashop.com
|
---|
352 | prezi.com
|
---|
353 | princeton.edu
|
---|
354 | privacyshield.gov
|
---|
355 | prnewswire.com
|
---|
356 | psychologytoday.com
|
---|
357 | qq.com
|
---|
358 | quantcast.com
|
---|
359 | quora.com
|
---|
360 | rakuten.co.jp
|
---|
361 | rambler.ru
|
---|
362 | rapidshare.com
|
---|
363 | reddit.com
|
---|
364 | repubblica.it
|
---|
365 | researchgate.net
|
---|
366 | reuters.com
|
---|
367 | ria.ru
|
---|
368 | rottentomatoes.com
|
---|
369 | rt.com
|
---|
370 | rtve.es
|
---|
371 | sakura.ne.jp
|
---|
372 | samsung.com
|
---|
373 | sapo.pt
|
---|
374 | scholastic.com
|
---|
375 | sciencedaily.com
|
---|
376 | sciencedirect.com
|
---|
377 | sciencemag.org
|
---|
378 | scientificamerican.com
|
---|
379 | scribd.com
|
---|
380 | seattletimes.com
|
---|
381 | secureserver.net
|
---|
382 | sedo.com
|
---|
383 | seesaa.net
|
---|
384 | sendspace.com
|
---|
385 | sfgate.com
|
---|
386 | shopify.com
|
---|
387 | shutterstock.com
|
---|
388 | siemens.com
|
---|
389 | sina.com.cn
|
---|
390 | sky.com
|
---|
391 | skype.com
|
---|
392 | skyrock.com
|
---|
393 | slate.com
|
---|
394 | slideshare.net
|
---|
395 | sm.cn
|
---|
396 | smh.com.au
|
---|
397 | so-net.ne.jp
|
---|
398 | softonic.com
|
---|
399 | sogou.com
|
---|
400 | sohu.com
|
---|
401 | soratemplates.com
|
---|
402 | soso.com
|
---|
403 | soundcloud.com
|
---|
404 | spiegel.de
|
---|
405 | spotify.com
|
---|
406 | springer.com
|
---|
407 | sputniknews.com
|
---|
408 | ssl-images-amazon.com
|
---|
409 | stackoverflow.com
|
---|
410 | standard.co.uk
|
---|
411 | stanford.edu
|
---|
412 | state.gov
|
---|
413 | steamcommunity.com
|
---|
414 | steampowered.com
|
---|
415 | storage.canalblog.com
|
---|
416 | storage.googleapis.com
|
---|
417 | stores.jp
|
---|
418 | storify.com
|
---|
419 | stuff.co.nz
|
---|
420 | surveymonkey.com
|
---|
421 | symantec.com
|
---|
422 | t-online.de
|
---|
423 | t.co
|
---|
424 | t.me
|
---|
425 | tabelog.com
|
---|
426 | taobao.com
|
---|
427 | target.com
|
---|
428 | teamviewer.com
|
---|
429 | techcrunch.com
|
---|
430 | ted.com
|
---|
431 | telegram.me
|
---|
432 | telegraph.co.uk
|
---|
433 | terra.com.br
|
---|
434 | theatlantic.com
|
---|
435 | thefreedictionary.com
|
---|
436 | theglobeandmail.com
|
---|
437 | theguardian.com
|
---|
438 | themeforest.net
|
---|
439 | thenextweb.com
|
---|
440 | thestar.com
|
---|
441 | thesun.co.uk
|
---|
442 | thetimes.co.uk
|
---|
443 | theverge.com
|
---|
444 | thoughtco.com
|
---|
445 | tianya.cn
|
---|
446 | time.com
|
---|
447 | tinyurl.com
|
---|
448 | tmall.com
|
---|
449 | tmz.com
|
---|
450 | tribunnews.com
|
---|
451 | tripadvisor.com
|
---|
452 | trustpilot.com
|
---|
453 | twitch.tv
|
---|
454 | twitter.com
|
---|
455 | ucoz.ru
|
---|
456 | uiuc.edu
|
---|
457 | umich.edu
|
---|
458 | un.org
|
---|
459 | undeveloped.com
|
---|
460 | unesco.org
|
---|
461 | uol.com.br
|
---|
462 | urbandictionary.com
|
---|
463 | usa.gov
|
---|
464 | usatoday.com
|
---|
465 | usgs.gov
|
---|
466 | usnews.com
|
---|
467 | uspto.gov
|
---|
468 | ustream.tv
|
---|
469 | utexas.edu
|
---|
470 | variety.com
|
---|
471 | venturebeat.com
|
---|
472 | vice.com
|
---|
473 | viglink.com
|
---|
474 | vimeo.com
|
---|
475 | vk.com
|
---|
476 | vkontakte.ru
|
---|
477 | vox.com
|
---|
478 | w3.org
|
---|
479 | w3schools.com
|
---|
480 | wa.me
|
---|
481 | walmart.com
|
---|
482 | washington.edu
|
---|
483 | washingtonpost.com
|
---|
484 | wattpad.com
|
---|
485 | weather.com
|
---|
486 | web.fc2.com
|
---|
487 | webmd.com
|
---|
488 | weebly.com
|
---|
489 | weibo.com
|
---|
490 | welt.de
|
---|
491 | whatsapp.com
|
---|
492 | whitehouse.gov
|
---|
493 | who.int
|
---|
494 | wikia.com
|
---|
495 | wikihow.com
|
---|
496 | wikimedia.org
|
---|
497 | wikipedia.org
|
---|
498 | wiktionary.org
|
---|
499 | wiley.com
|
---|
500 | windowsphone.com
|
---|
501 | wired.com
|
---|
502 | wix.com
|
---|
503 | wordpress.org
|
---|
504 | worldbank.org
|
---|
505 | wp.com
|
---|
506 | wsj.com
|
---|
507 | xbox.com
|
---|
508 | xinhuanet.com
|
---|
509 | yadi.sk
|
---|
510 | yahoo.co.jp
|
---|
511 | yahoo.com
|
---|
512 | yale.edu
|
---|
513 | yandex.ru
|
---|
514 | yelp.com
|
---|
515 | youku.com
|
---|
516 | youronlinechoices.com
|
---|
517 | youtu.be
|
---|
518 | youtube.com
|
---|
519 | ytimg.com
|
---|
520 | zdnet.com
|
---|
521 | zend.com
|
---|
522 | zendesk.com
|
---|
523 | zippyshare.com
|
---|
Note:
See
TracBrowser
for help on using the repository browser.