Last change
on this file since 33551 was 33551, checked in by ak19, 5 years ago |
Added in top 500 urls from moz.com/top500 and removed duplicates, and removed subdomain variants keeping just main site variant, and sorted alphabetically again.
|
File size:
5.8 KB
|
Line | |
---|
1 |
|
---|
2 | # Add alexa top sites (only 50 visible)
|
---|
3 | # Added further top sites from https://en.wikipedia.org/wiki/List_of_most_popular_websites
|
---|
4 | ## Finally also got the CSV from https://moz.com/top500 and added it to the list and added them in.
|
---|
5 | # Then used LibreOffice's Calc spreadsheet software to sort alphabetically and remove duplicates.
|
---|
6 | # Then in Gedit, used regex search and replace to remove <subdomain>.<site>.ext to keep just <site>.ext
|
---|
7 | # And resorted alphabetically
|
---|
8 |
|
---|
9 |
|
---|
10 | 000webhost.com
|
---|
11 | 360.cn
|
---|
12 | 4shared.com
|
---|
13 | a8.net
|
---|
14 | abc.es
|
---|
15 | abc.net.au
|
---|
16 | abcnews.go.com
|
---|
17 | about.com
|
---|
18 | about.me
|
---|
19 | aboutads.info
|
---|
20 | abril.com.br
|
---|
21 | academia.edu
|
---|
22 | accuweather.com
|
---|
23 | addthis.com
|
---|
24 | addtoany.com
|
---|
25 | adobe.com
|
---|
26 | airbnb.com
|
---|
27 | akamaihd.net
|
---|
28 | alexa.com
|
---|
29 | alibaba.com
|
---|
30 | aliexpress.com
|
---|
31 | alipay.com
|
---|
32 | aljazeera.com
|
---|
33 | allaboutcookies.org
|
---|
34 | allrecipes.com
|
---|
35 | amazon.
|
---|
36 | ampproject.org
|
---|
37 | android.com
|
---|
38 | aol.com
|
---|
39 | ap.org
|
---|
40 | apache.org
|
---|
41 | apachefriends.org
|
---|
42 | apple.com
|
---|
43 | archive.org
|
---|
44 | arstechnica.com
|
---|
45 | arxiv.org
|
---|
46 | asahi.com
|
---|
47 | ask.fm
|
---|
48 | asus.com
|
---|
49 | axs.com
|
---|
50 | babytree.com
|
---|
51 | baidu.com
|
---|
52 | bandcamp.com
|
---|
53 | bbc.co.uk
|
---|
54 | bbc.com
|
---|
55 | berkeley.edu
|
---|
56 | biblegateway.com
|
---|
57 | biglobe.ne.jp
|
---|
58 | billboard.com
|
---|
59 | bing.com
|
---|
60 | bit.ly
|
---|
61 | bitly.com
|
---|
62 | blackberry.com
|
---|
63 | blogger.com
|
---|
64 | blogspot.com
|
---|
65 | bloomberg.com
|
---|
66 | booking.com
|
---|
67 | box.com
|
---|
68 | britannica.com
|
---|
69 | bt.com
|
---|
70 | bund.de
|
---|
71 | businessinsider.com
|
---|
72 | businesswire.com
|
---|
73 | buydomains.com
|
---|
74 | buzzfeed.com
|
---|
75 | ca.gov
|
---|
76 | cambridge.org
|
---|
77 | cbc.ca
|
---|
78 | cbsnews.com
|
---|
79 | cdc.gov
|
---|
80 | change.org
|
---|
81 | channel4.com
|
---|
82 | chicagotribune.com
|
---|
83 | cisco.com
|
---|
84 | clickbank.net
|
---|
85 | cloudflare.com
|
---|
86 | cnbc.com
|
---|
87 | cnet.com
|
---|
88 | cnn.com
|
---|
89 | cocolog-nifty.com
|
---|
90 | columbia.edu
|
---|
91 | cornell.edu
|
---|
92 | corriere.it
|
---|
93 | cpanel.com
|
---|
94 | cpanel.net
|
---|
95 | creativecommons.org
|
---|
96 | csdn.net
|
---|
97 | csmonitor.com
|
---|
98 | dailymail.co.uk
|
---|
99 | dailymotion.com
|
---|
100 | dan.com
|
---|
101 | daum.net
|
---|
102 | dell.com
|
---|
103 | depositfiles.com
|
---|
104 | detik.com
|
---|
105 | digg.com
|
---|
106 | disney.com
|
---|
107 | disqus.com
|
---|
108 | doubleclick.net
|
---|
109 | dreniq.com
|
---|
110 | dribbble.com
|
---|
111 | dropbox.com
|
---|
112 | dropboxusercontent.com
|
---|
113 | dw.com
|
---|
114 | e-recht24.de
|
---|
115 | ea.com
|
---|
116 | ebay.co.uk
|
---|
117 | ebay.com
|
---|
118 | economist.com
|
---|
119 | eff.org
|
---|
120 | ehow.com
|
---|
121 | elmundo.es
|
---|
122 | elpais.com
|
---|
123 | engadget.com
|
---|
124 | entrepreneur.com
|
---|
125 | eonline.com
|
---|
126 | espn.com
|
---|
127 | espn.go.com
|
---|
128 | etsy.com
|
---|
129 | europa.eu
|
---|
130 | eventbrite.com
|
---|
131 | example.com
|
---|
132 | excite.co.jp
|
---|
133 | express.co.uk
|
---|
134 | facebook.com
|
---|
135 | fandom.com
|
---|
136 | fastcompany.com
|
---|
137 | fb.com
|
---|
138 | fb.me
|
---|
139 | fda.gov
|
---|
140 | fedoraproject.org
|
---|
141 | feedburner.com
|
---|
142 | fifa.com
|
---|
143 | files.wordpress.com
|
---|
144 | flickr.com
|
---|
145 | forbes.com
|
---|
146 | fortune.com
|
---|
147 | foursquare.com
|
---|
148 | foxnews.com
|
---|
149 | ft.com
|
---|
150 | ftc.gov
|
---|
151 | gen.xyz
|
---|
152 | geocities.jp
|
---|
153 | gesetze-im-internet.de
|
---|
154 | ggpht.com
|
---|
155 | github.com
|
---|
156 | gizmodo.com
|
---|
157 | globo.com
|
---|
158 | gmail.com
|
---|
159 | gnu.org
|
---|
160 | godaddy.com
|
---|
161 | gofundme.com
|
---|
162 | goo.gl
|
---|
163 | goo.ne.jp
|
---|
164 | goodreads.com
|
---|
165 | google.
|
---|
166 | googleblog.com
|
---|
167 | googleusercontent.com
|
---|
168 | gooyaabitemplates.com
|
---|
169 | gov.uk
|
---|
170 | gravatar.com
|
---|
171 | greenpeace.org
|
---|
172 | gstatic.com
|
---|
173 | guardian.co.uk
|
---|
174 | harvard.edu
|
---|
175 | hatena.ne.jp
|
---|
176 | histats.com
|
---|
177 | hm.com
|
---|
178 | hollywoodreporter.com
|
---|
179 | home.pl
|
---|
180 | house.gov
|
---|
181 | howstuffworks.com
|
---|
182 | hp.com
|
---|
183 | huffingtonpost.com
|
---|
184 | huffpost.com
|
---|
185 | hugedomains.com
|
---|
186 | ibm.com
|
---|
187 | ibtimes.com
|
---|
188 | icann.org
|
---|
189 | ieee.org
|
---|
190 | ietf.org
|
---|
191 | ig.com.br
|
---|
192 | ign.com
|
---|
193 | ikea.com
|
---|
194 | imageshack.us
|
---|
195 | imdb.com
|
---|
196 | imgur.com
|
---|
197 | inc.com
|
---|
198 | independent.co.uk
|
---|
199 | indiatimes.com
|
---|
200 | indiegogo.com
|
---|
201 | instagram.com
|
---|
202 | intel.com
|
---|
203 | issuu.com
|
---|
204 | istockphoto.com
|
---|
205 | iubenda.com
|
---|
206 | jd.com
|
---|
207 | joomla.org
|
---|
208 | jquery.com
|
---|
209 | jstor.org
|
---|
210 | kickstarter.com
|
---|
211 | kinja.com
|
---|
212 | last.fm
|
---|
213 | latimes.com
|
---|
214 | lefigaro.fr
|
---|
215 | lemonde.fr
|
---|
216 | line.me
|
---|
217 | linkedin.com
|
---|
218 | list-manage.com
|
---|
219 | live.com
|
---|
220 | livejournal.com
|
---|
221 | livescience.com
|
---|
222 | loc.gov
|
---|
223 | lycos.com
|
---|
224 | mail.ru
|
---|
225 | marketwatch.com
|
---|
226 | marriott.com
|
---|
227 | mashable.com
|
---|
228 | mediafire.com
|
---|
229 | medium.com
|
---|
230 | mega.nz
|
---|
231 | mercurynews.com
|
---|
232 | merriam-webster.com
|
---|
233 | metro.co.uk
|
---|
234 | microsoft.com
|
---|
235 | microsoftonline.com
|
---|
236 | mirror.co.uk
|
---|
237 | mit.edu
|
---|
238 | mixcloud.com
|
---|
239 | mlb.com
|
---|
240 | mozilla.com
|
---|
241 | mozilla.org
|
---|
242 | msn.com
|
---|
243 | myspace.com
|
---|
244 | mysql.com
|
---|
245 | namecheap.com
|
---|
246 | narod.ru
|
---|
247 | nasa.gov
|
---|
248 | nationalgeographic.com
|
---|
249 | nature.com
|
---|
250 | naver.com
|
---|
251 | naver.jp
|
---|
252 | nbcnews.com
|
---|
253 | ndtv.com
|
---|
254 | netflix.com
|
---|
255 | netsons.com
|
---|
256 | netvibes.com
|
---|
257 | networkadvertising.org
|
---|
258 | news.com.au
|
---|
259 | newscientist.com
|
---|
260 | newsweek.com
|
---|
261 | nginx.com
|
---|
262 | nginx.org
|
---|
263 | nhk.or.jp
|
---|
264 | nicovideo.jp
|
---|
265 | nifty.com
|
---|
266 | nih.gov
|
---|
267 | nikkei.com
|
---|
268 | noaa.gov
|
---|
269 | nokia.com
|
---|
270 | npr.org
|
---|
271 | nvidia.com
|
---|
272 | nydailynews.com
|
---|
273 | nypost.com
|
---|
274 | nytimes.com
|
---|
275 | nyu.edu
|
---|
276 | odnoklassniki.ru
|
---|
277 | office.com
|
---|
278 | ok.ru
|
---|
279 | okezone.com
|
---|
280 | opera.com
|
---|
281 | oracle.com
|
---|
282 | orange.fr
|
---|
283 | oreilly.com
|
---|
284 | oup.com
|
---|
285 | over-blog.com
|
---|
286 | ovh.co.uk
|
---|
287 | ovh.com
|
---|
288 | ovh.net
|
---|
289 | ox.ac.uk
|
---|
290 | parallels.com
|
---|
291 | pastebin.com
|
---|
292 | paypal.com
|
---|
293 | pbs.org
|
---|
294 | people.com
|
---|
295 | photobucket.com
|
---|
296 | php.net
|
---|
297 | pinterest.com
|
---|
298 | pixabay.com
|
---|
299 | playstation.com
|
---|
300 | plesk.com
|
---|
301 | politico.com
|
---|
302 | prezi.com
|
---|
303 | princeton.edu
|
---|
304 | privacyshield.gov
|
---|
305 | prnewswire.com
|
---|
306 | psychologytoday.com
|
---|
307 | qq.com
|
---|
308 | quantcast.com
|
---|
309 | quora.com
|
---|
310 | rakuten.co.jp
|
---|
311 | rambler.ru
|
---|
312 | rapidshare.com
|
---|
313 | reddit.com
|
---|
314 | repubblica.it
|
---|
315 | reuters.com
|
---|
316 | ria.ru
|
---|
317 | rottentomatoes.com
|
---|
318 | rt.com
|
---|
319 | rtve.es
|
---|
320 | samsung.com
|
---|
321 | sapo.pt
|
---|
322 | sciencedaily.com
|
---|
323 | sciencedirect.com
|
---|
324 | sciencemag.org
|
---|
325 | scientificamerican.com
|
---|
326 | scribd.com
|
---|
327 | seattletimes.com
|
---|
328 | secureserver.net
|
---|
329 | sedo.com
|
---|
330 | seesaa.net
|
---|
331 | sendspace.com
|
---|
332 | sfgate.com
|
---|
333 | shopify.com
|
---|
334 | shutterstock.com
|
---|
335 | siemens.com
|
---|
336 | sina.com.cn
|
---|
337 | sky.com
|
---|
338 | skype.com
|
---|
339 | skyrock.com
|
---|
340 | slideshare.net
|
---|
341 | sm.cn
|
---|
342 | smh.com.au
|
---|
343 | so-net.ne.jp
|
---|
344 | softonic.com
|
---|
345 | sogou.com
|
---|
346 | sohu.com
|
---|
347 | soratemplates.com
|
---|
348 | soso.com
|
---|
349 | soundcloud.com
|
---|
350 | spiegel.de
|
---|
351 | spotify.com
|
---|
352 | springer.com
|
---|
353 | sputniknews.com
|
---|
354 | stackoverflow.com
|
---|
355 | stanford.edu
|
---|
356 | state.gov
|
---|
357 | steamcommunity.com
|
---|
358 | steampowered.com
|
---|
359 | storage.canalblog.com
|
---|
360 | stores.jp
|
---|
361 | storify.com
|
---|
362 | stuff.co.nz
|
---|
363 | surveymonkey.com
|
---|
364 | symantec.com
|
---|
365 | t-online.de
|
---|
366 | t.co
|
---|
367 | t.me
|
---|
368 | tabelog.com
|
---|
369 | taobao.com
|
---|
370 | target.com
|
---|
371 | techcrunch.com
|
---|
372 | ted.com
|
---|
373 | telegram.me
|
---|
374 | telegraph.co.uk
|
---|
375 | terra.com.br
|
---|
376 | theglobeandmail.com
|
---|
377 | theguardian.com
|
---|
378 | themeforest.net
|
---|
379 | thestar.com
|
---|
380 | thesun.co.uk
|
---|
381 | thetimes.co.uk
|
---|
382 | theverge.com
|
---|
383 | thoughtco.com
|
---|
384 | tianya.cn
|
---|
385 | time.com
|
---|
386 | tinyurl.com
|
---|
387 | tmall.com
|
---|
388 | tmz.com
|
---|
389 | tribunnews.com
|
---|
390 | tripadvisor.com
|
---|
391 | trustpilot.com
|
---|
392 | twitch.tv
|
---|
393 | twitter.com
|
---|
394 | ucoz.ru
|
---|
395 | uiuc.edu
|
---|
396 | umich.edu
|
---|
397 | un.org
|
---|
398 | undeveloped.com
|
---|
399 | unesco.org
|
---|
400 | uol.com.br
|
---|
401 | urbandictionary.com
|
---|
402 | usatoday.com
|
---|
403 | usgs.gov
|
---|
404 | usnews.com
|
---|
405 | uspto.gov
|
---|
406 | ustream.tv
|
---|
407 | utexas.edu
|
---|
408 | variety.com
|
---|
409 | venturebeat.com
|
---|
410 | vice.com
|
---|
411 | viglink.com
|
---|
412 | vimeo.com
|
---|
413 | vk.com
|
---|
414 | vkontakte.ru
|
---|
415 | vox.com
|
---|
416 | w3.org
|
---|
417 | w3schools.com
|
---|
418 | wa.me
|
---|
419 | walmart.com
|
---|
420 | washington.edu
|
---|
421 | washingtonpost.com
|
---|
422 | wattpad.com
|
---|
423 | web.fc2.com
|
---|
424 | webmd.com
|
---|
425 | weebly.com
|
---|
426 | weibo.com
|
---|
427 | welt.de
|
---|
428 | whatsapp.com
|
---|
429 | whitehouse.gov
|
---|
430 | who.int
|
---|
431 | wikia.com
|
---|
432 | wikihow.com
|
---|
433 | wikimedia.org
|
---|
434 | wikipedia.org
|
---|
435 | wikipedia.org
|
---|
436 | wikipedia.org
|
---|
437 | wiktionary.org
|
---|
438 | wiley.com
|
---|
439 | windowsphone.com
|
---|
440 | wired.com
|
---|
441 | wix.com
|
---|
442 | wordpress.org
|
---|
443 | worldbank.org
|
---|
444 | wp.com
|
---|
445 | wsj.com
|
---|
446 | xbox.com
|
---|
447 | xinhuanet.com
|
---|
448 | yadi.sk
|
---|
449 | yahoo.co.
|
---|
450 | yahoo.com
|
---|
451 | yahoo.com
|
---|
452 | yale.edu
|
---|
453 | yandex.ru
|
---|
454 | yelp.com
|
---|
455 | youku.com
|
---|
456 | youronlinechoices.com
|
---|
457 | youtu.be
|
---|
458 | youtube.com
|
---|
459 | ytimg.com
|
---|
460 | zdnet.com
|
---|
461 | zendesk.com
|
---|
462 |
|
---|
463 |
|
---|
Note:
See
TracBrowser
for help on using the repository browser.