Changeset 33914 for other-projects/maori-lang-detection/MoreReading
- Timestamp:
- 2020-02-13T17:09:07+13:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/MoreReading/mongodb.txt
r33913 r33914 1103 1103 - RUSSIA: https://www.gismeteo.lv - misidentification of an email address 1104 1104 - JAPAN: http://yutaka.it-n.jp - many pages of scientific names of (plants?) which are often misdetected as MRI 1105 !! - I reland, ie: https://coggle.it1105 !! - IRELAND, IE: https://coggle.it 1106 1106 - IRAN: https://www.dideo.ir/v/yt/d6cgya0ze-E - video title from MaoriTelevision website 1107 1107 - CZECH republic: … … 1371 1371 X https://docs.google.com, timetable with occasional Maori language word 1372 1372 + https://drive.google.com, https://drive.google.com/file/d/1NwuzafjddaP8gxI7O_Zapts5bM7mrtwn/preview is an image of Maori number names. But other page on drive.google.com is a NZ certificate or ID (in English) of a person's position. 1373 http://ritusehji.blogspot.com - no page with more than 1 sentence detected. But short string of actual MRI content. Educator blog with pictures and English language content.1373 ~+ http://ritusehji.blogspot.com - no page with more than 1 sentence detected. But short string of actual MRI content. Educator blog with pictures and English language content. 1374 1374 1375 1375 … … 1541 1541 X https://mi.lawyers.cafe - autotranslated 1542 1542 X https://mi.centr-zashity.ru - same as lawyers.cafe above: autotranslated 1543 ! https://policies.oclc.org - not completely translated. Copyright page, privacy statement and cookie statement pages appear to be in Maori. Not sure if autotranslated since other pages aren't available in MI. Dutch equivalent pages seem human translated.1543 ~! https://policies.oclc.org - not completely translated. Copyright page, privacy statement and cookie statement pages appear to be in Maori. Not sure if autotranslated since other pages aren't available in MI. Dutch equivalent pages seem human translated. 1544 1544 X http://jobdescriptionsample.org - autotranslated 1545 1545 X http://mi.broadcastbeat.com - autotranslated product site … … 1619 1619 IT, AT, RO, CH, RU, BG, MX, JP, CN, IE, IR, FI same 1620 1620 1621 US gained 3: 1622 anglican.org (NEW) 1623 articles.imperialtometric.com (from CA) 1624 daandehn.com (CA) 1621 US gained 3 + 1 from mi in URL path: 1622 + anglican.org (NEW) 1623 X articles.imperialtometric.com (from CA) 1624 X daandehn.com (from CA) 1625 + kiwiproperty.com (from AU) 1625 1626 1626 1627 CA lost 2: 1627 articles.imperialtometric.com (to US)1628 daandehn.com (to US)1628 X articles.imperialtometric.com (to US) 1629 X daandehn.com (to US) 1629 1630 1630 1631 AU: 1631 lost kiwiproperty.com (to US - mi in URL path version file!)1632 ! lost kiwiproperty.com (to US - mi in URL path version file!) 1632 1633 1633 1634 1634 1635 CZ: 1635 gained viveipcl.com (from UNKNOWN)1636 X gained viveipcl.com (from UNKNOWN) 1636 1637 1637 1638 UNKNOWN: 1638 gained hitiaotera.com from IL1639 X gained hitiaotera.com from IL 1639 1640 1640 1641 IL: 1641 lost one to (UNKNOWN) 1642 1642 X lost one (hitiaotera.com to UNKNOWN) 1643 1644 1645 FINAL SITE COUNT (contain >= 1 page with >= 1 MRI sentence) 1646 1647 DK: 1648 http://ngapuhiradio.com 1649 http://ngapuhitelevision.com 1650 [http://akona.ngapuhitelevision.com 1651 http://waiatarangatiratanga.ngapuhitelevision.com 1652 http://jazz.ngapuhitelevision.com 1653 http://powhiri.ngapuhitelevision.com 1654 http://komisch.ngapuhitelevision.com] 1655 1656 DE 1657 http://www.udhr.de 1658 https://www.cartogiraffe.com/ 1659 1660 AU 1661 https://koreromaori.com 1662 (https://infogram.com/) 1663 1664 FR 1665 http://chantsdeluttes.free.fr/ 1666 1667 ES 1668 https://www.uv.es/ 1669 1670 IE 1671 https://coggle.it 1672 1673 CZ: 1674 http://www.henryklahola.nazory.cz 1675 1676 BG: 1677 http://anitra.net/ 1678 1679 US finals: 1680 http://anglican.org 1681 http://anglicanhistory.org 1682 http://www.unicode.org 1683 https://static-promote.weebly.com 1684 http://aclhokiangarocks.blogspot.com 1685 http://bahaiprayers.net 1686 https://biblehub.com 1687 http://www.muhammad.com 1688 http://www.godrules.net 1689 http://m.biblepub.com 1690 http://www.krassotkin.ru 1691 http://www.gotquestions.org 1692 https://maorinews.com 1693 http://maaori.com 1694 http://kiaorahola.blogspot.com 1695 https://kjohnsonnz.blogspot.com 1696 http://pumanawawhangara.blogspot.com 1697 http://dannykahei.tripod.com 1698 http://burkekm001.tripod.com 1699 http://tkkpipipaopao.blogspot.com 1700 http://manateina.blogspot.com 1701 http://tatai09.blogspot.com 1702 http://www.twttoa.com 1703 http://tuhua2010.blogspot.com 1704 http://piripi.blogspot.com 1705 https://www.breaker.audio 1706 https://drive.google.com 1707 http://ritusehji.blogspot.com 1708 https://in.pinterest.com 1709 1710 29 1711 1712 https://www.kiwiproperty.com 1713 http://indigenousblogs.com 1714 https://mi.m.wikipedia.org, https://mi.wikipedia.org 1715 http://csunplugged.org, https://www.csunplugged.org 1716 (https://policies.oclc.org) 1717 1718 34 incl with MI in URL Path 1719 1720 1721 --------------------- 1722 NZ: 1723 http://www.teipukarea.maori.nz 1724 http://ngatipahauwera.co.nz 1725 http://www.oag.govt.nz 1726 https://sexualviolence.victimsinfo.govt.nz 1727 http://tmoa.tki.org.nz 1728 http://www.tewhanake.maori.nz 1729 http://www.matarikifestival.org.nz 1730 http://www.otepoti.school.nz 1731 https://www.maoritelevision.com 1732 http://pukapuka.nz 1733 http://community.nzdl.org 1734 http://maori.livingheritage.org.nz [http://www.livingheritage.org.nz] 1735 http://pukoro.co.nz 1736 https://cdn.tehiku.nz [DOMAIN: tehiku.nz] 1737 http://www.runanga.co.nz 1738 http://kuraaiwi.maori.nz 1739 http://kurataiao.tki.org.nz 1740 http://satellites.co.nz 1741 http://teaohou.natlib.govt.nz 1742 http://www.tuwharetoa.iwi.nz 1743 https://www.terito.school.nz 1744 https://ttw1.cwp.govt.nz 1745 https://www.whanau-tahi.school.nz 1746 https://e-ako-pangarau.nzmaths.co.nz 1747 https://teaomaori.news 1748 http://tetaurawhiri.govt.nz 1749 https://www.tuiatematangi.ac.nz 1750 http://animations.tewhanake.maori.nz 1751 https://www.dnc.org.nz 1752 http://firstworldwar.tki.org.nz [http://www.firstworldwar.tki.org.nz] 1753 http://www.28maoribattalion.org.nz 1754 http://www.tewikiotereomaori.co.nz 1755 http://www.brettgraham.co.nz 1756 https://hepatakakupu.nz 1757 http://anglicanprayerbook.nz 1758 http://arataua.nz 1759 http://maori.tki.org.nz 1760 https://paekupu.co.nz 1761 https://haereheikaiako.co.nz 1762 https://curriculumtool.education.govt.nz 1763 http://kurakokiri.maori.nz [includes: http://www.kurakokiri.maori.nz] 1764 http://www.kkmmaungarongo.co.nz 1765 http://www.heartland.co.nz 1766 http://oilcrash.com 1767 http://www.kura-porirua.school.nz 1768 https://www.sporty.co.nz 1769 https://www.tematawai.maori.nz 1770 https://www.terakipaewhenua.school.nz 1771 http://www.tetaurawhiri.govt.nz 1772 http://archive.stats.govt.nz 1773 http://tiritiowaitangi.govt.nz 1774 http://www.waiata.maori.nz [includes: http://waiata.maori.nz] 1775 http://hana.co.nz 1776 http://kaupare.co.nz 1777 http://www.tereowrap.nz 1778 http://www.hrc.co.nz 1779 http://ngatiporoukiponeke.org.nz 1780 http://rurued.school.nz 1781 http://www.twtop.school.nz 1782 http://www.huri-translations.pf 1783 https://teara.govt.nz/ [https://admin.teara.govt.nz, http://blog.teara.govt.nz] 1784 https://tiritiowaitangi.govt.nz 1785 http://www.tmoa.tki.org.nz 1786 https://www.komako.org.nz 1787 http://www.wcl.govt.nz [included: http://kete.wcl.govt.nz] 1788 http://punareo.co.nz 1789 https://rapuatearatika.education.govt.nz 1790 http://tmmkkm.school.nz 1791 http://www.cs.waikato.ac.nz 1792 http://www.kupengahao.co.nz 1793 https://www.hapuhauora.health.nz 1794 http://cms.sunsmartschools.co.nz [http://sunsmartschools.co.nz/] 1795 http://kuraproductions.co.nz 1796 https://keepourmoneyclean.govt.nz 1797 http://www.tekura.school.nz 1798 http://www.tkkmmokopuna.school.nz 1799 http://hangaraumatihiko.tki.org.nz 1800 http://www.pakanae.maori.nz 1801 1802 1803 http://holyspirit.nz 1804 https://www.ngamanawainc.co.nz, [includes http://www.ngamanawainc.co.nz] 1805 http://www.finlaysonpark.school.nz 1806 http://www.w3vietnam.org.nz [includes http://w3vietnam.org.nz] 1807 https://www.takitimu.ac.nz 1808 https://kotahimiriona.co.nz 1809 https://rehuamarae.co.nz 1810 http://reoora.co.nz 1811 1812 https://manawatuheritage.pncc.govt.nz 1813 http://rsnz.natlib.govt.nz 1814 https://www.taitokerautrust.org.nz 1815 http://tewikiotereomaori.nz 1816 https://www.korokikahukura.co.nz 1817 https://www.pinterest.nz 1818 https://www.rereahu.maori.nz 1819 http://givealittle.co.nz 1820 https://kaiiwicamp.nz [includes http://kaiiwicamp.nz] 1821 http://ngarauhuia.ngatiapakiterato.iwi.nz 1822 https://m.wairarapatv.co.nz 1823 1824 http://avonside.net 1825 http://www.maoriinvestments.co.nz 1826 http://conference.tpwt.maori.nz 1827 https://www.puau.school.nz 1828 http://tehauora.org.nz 1829 1830 http://temahurehure.maori.nz 1831 http://www.temarareo.org 1832 http://www.tetaumuturunanga.iwi.nz 1833 http://www.writersfestival.co.nz 1834 http://www.kmk.maori.nz 1835 https://www.stats.govt.nz [includes http://archive.stats.govt.nz] 1836 1837 +? http://ngatiwhakaue.iwi.nz 1838 +? https://interactives.stuff.co.nz 1839 +? http://whatonga.school.nz 1840 +? https://player.vimeo.com 1841 +? http://southerntribes.co.nz 1842 1843 ?X https://www.e-agent.nz [includes: https://office.e-agent.nz, http://videos.e-agent.nz]
Note:
See TracChangeset
for help on using the changeset viewer.