source: main/trunk/model-sites-dev/hathitrust/collect/capisco-european-pacific-encounters/etc/conf/lang/stoptags_ja.txt@ 31289

Last change on this file since 31289 was 31289, checked in by davidb, 7 years ago

initial setup files for collection

File size: 16.3 KB
Line 
1#
2# This file defines a Japanese stoptag set for JapanesePartOfSpeechStopFilter.
3#
4# Any token with a part-of-speech tag that exactly matches those defined in this
5# file are removed from the token stream.
6#
7# Set your own stoptags by uncommenting the lines below. Note that comments are
8# not allowed on the same line as a stoptag. See LUCENE-3745 for frequency lists,
9# etc. that can be useful for building you own stoptag set.
10#
11# The entire possible tagset is provided below for convenience.
12#
13#####
14# noun: unclassified nouns
15#名詞
16#
17# noun-common: Common nouns or nouns where the sub-classification is undefined
18#名詞-䞀般
19#
20# noun-proper: Proper nouns where the sub-classification is undefined
21#名詞-固有名詞
22#
23# noun-proper-misc: miscellaneous proper nouns
24#名詞-固有名詞-䞀般
25#
26# noun-proper-person: Personal names where the sub-classification is undefined
27#名詞-固有名詞-人名
28#
29# noun-proper-person-misc: names that cannot be divided into surname and
30# given name; foreign names; names where the surname or given name is unknown.
31# e.g. お垂の方
32#名詞-固有名詞-人名-䞀般
33#
34# noun-proper-person-surname: Mainly Japanese surnames.
35# e.g. 山田
36#名詞-固有名詞-人名-姓
37#
38# noun-proper-person-given_name: Mainly Japanese given names.
39# e.g. 倪郎
40#名詞-固有名詞-人名-名
41#
42# noun-proper-organization: Names representing organizations.
43# e.g. 通産省, NHK
44#名詞-固有名詞-組織
45#
46# noun-proper-place: Place names where the sub-classification is undefined
47#名詞-固有名詞-地域
48#
49# noun-proper-place-misc: Place names excluding countries.
50# e.g. アゞア, バルセロナ, 京郜
51#名詞-固有名詞-地域-䞀般
52#
53# noun-proper-place-country: Country names.
54# e.g. 日本, オヌストラリア
55#名詞-固有名詞-地域-囜
56#
57# noun-pronoun: Pronouns where the sub-classification is undefined
58#名詞-代名詞
59#
60# noun-pronoun-misc: miscellaneous pronouns:
61# e.g. それ, ここ, あい぀, あなた, あちこち, いく぀, どこか, なに, みなさん, みんな, わたくし, われわれ
62#名詞-代名詞-䞀般
63#
64# noun-pronoun-contraction: Spoken language contraction made by combining a
65# pronoun and the particle 'wa'.
66# e.g. ありゃ, こりゃ, こりゃあ, そりゃ, そりゃあ
67#名詞-代名詞-瞮玄
68#
69# noun-adverbial: Temporal nouns such as names of days or months that behave
70# like adverbs. Nouns that represent amount or ratios and can be used adverbially,
71# e.g. 金曜, 䞀月, 午埌, 少量
72#名詞-副詞可胜
73#
74# noun-verbal: Nouns that take arguments with case and can appear followed by
75# 'suru' and related verbs (する, できる, なさる, くださる)
76# e.g. むンプット, 愛着, 悪化, 悪戊苊闘, 䞀安心, 䞋取り
77#名詞-サ倉接続
78#
79# noun-adjective-base: The base form of adjectives, words that appear before な ("na")
80# e.g. 健康, 安易, 駄目, だめ
81#名詞-圢容動詞語幹
82#
83# noun-numeric: Arabic numbers, Chinese numerals, and counters like 䜕 (回), 数.
84# e.g. 0, 1, 2, 䜕, 数, 幟
85#名詞-数
86#
87# noun-affix: noun affixes where the sub-classification is undefined
88#名詞-非自立
89#
90# noun-affix-misc: Of adnominalizers, the case-marker の ("no"), and words that
91# attach to the base form of inflectional words, words that cannot be classified
92# into any of the other categories below. This category includes indefinite nouns.
93# e.g. あか぀き, 暁, かい, 甲斐, 気, きらい, 嫌い, くせ, 癖, こず, 事, ごず, 毎, しだい, 次第,
94# 順, せい, 所為, ぀いで, 序で, ぀もり, 積もり, 点, どころ, の, はず, 筈, はずみ, 匟み,
95# 拍子, ふう, ふり, 振り, ほう, 方, æ—š, もの, 物, 者
96, ゆえ, 故
97, ゆえん, 所以, わけ, èš³,
98# わり, 割り, 割, ん-口語/, もん-口語/
99#名詞-非自立-䞀般
100#
101# noun-affix-adverbial: noun affixes that that can behave as adverbs.
102# e.g. あいだ, 間, あげく, 挙げ句, あず, 埌, 䜙り, 以倖, 以降, 以埌, 以䞊, 以前, 䞀方, うえ,
103# 侊, うち, 内
104, おり, 折り, かぎり, 限り, きり, っきり, 結果, ころ, 頃, さい, 際, 最䞭, さなか,
105# 最䞭, じたい, 自䜓, たび, 床, ため, 為, ぀ど, 郜床, ずおり, 通り, ずき, 時, ずころ, 所,
106# ずたん, 途端, なか, äž­, のち, 埌, ばあい, 堎合, 日, ぶん, 分, ほか, 他, たえ, 前, たた,
107# 儘, 䟭, みぎり, 矢å…
108ˆ
109#名詞-非自立-副詞可胜
110#
111# noun-affix-aux: noun affixes treated as 助動詞 ("auxiliary verb") in school grammars
112# with the stem よう(だ) ("you(da)").
113# e.g. よう, やう, 様 (よう)
114#名詞-非自立-助動詞語幹
115#
116# noun-affix-adjective-base: noun affixes that can connect to the indeclinable
117# connection form な (aux "da").
118# e.g. みたい, ふう
119#名詞-非自立-圢容動詞語幹
120#
121# noun-special: special nouns where the sub-classification is undefined.
122#名詞-特殊
123#
124# noun-special-aux: The そうだ ("souda") stem form that is used for reporting news, is
125# treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the base
126# form of inflectional words.
127# e.g. そう
128#名詞-特殊-助動詞語幹
129#
130# noun-suffix: noun suffixes where the sub-classification is undefined.
131#名詞-接尟
132#
133# noun-suffix-misc: Of the nouns or stem forms of other parts of speech that connect
134# to ガル or ã‚¿ã‚€ and can combine into compound nouns, words that cannot be classified into
135# any of the other categories below. In general, this category is more inclusive than
136# 接尟語 ("suffix") and is usually the last element in a compound noun.
137# e.g. おき, かた, 方, 甲斐 (がい), がかり, ぎみ, 気味, ぐるみ, (した) さ, 次第, 枈 (ず) み,
138# よう, (でき)っこ, 感, 芳, 性, å­Š, 類, 面, 甹
139#名詞-接尟-䞀般
140#
141# noun-suffix-person: Suffixes that form nouns and attach to person names more often
142# than other nouns.
143# e.g. 君, 様, 著
144#名詞-接尟-人名
145#
146# noun-suffix-place: Suffixes that form nouns and attach to place names more often
147# than other nouns.
148# e.g. 町, åž‚, 県
149#名詞-接尟-地域
150#
151# noun-suffix-verbal: Of the suffixes that attach to nouns and form nouns, those that
152# can appear before スル ("suru").
153# e.g. 化, 芖, 分け, å…
154¥ã‚Š, 萜ち, 買い
155#名詞-接尟-サ倉接続
156#
157# noun-suffix-aux: The stem form of そうだ (様æ…
158‹) that is used to indicate conditions,
159# is treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the
160# conjunctive form of inflectional words.
161# e.g. そう
162#名詞-接尟-助動詞語幹
163#
164# noun-suffix-adjective-base: Suffixes that attach to other nouns or the conjunctive
165# form of inflectional words and appear before the copula だ ("da").
166# e.g. 的, げ, がち
167#名詞-接尟-圢容動詞語幹
168#
169# noun-suffix-adverbial: Suffixes that attach to other nouns and can behave as adverbs.
170# e.g. 埌 (ご), 以埌, 以降, 以前, 前埌, äž­, 末, 侊, 時 (じ)
171#名詞-接尟-副詞可胜
172#
173# noun-suffix-classifier: Suffixes that attach to numbers and form nouns. This category
174# is more inclusive than 助数詞 ("classifier") and includes common nouns that attach
175# to numbers.
176# e.g. 個, ぀, 本, 冊, パヌセント, cm, kg, カ月, か囜, 区画, 時間, 時半
177#名詞-接尟-助数詞
178#
179# noun-suffix-special: Special suffixes that mainly attach to inflecting words.
180# e.g. (楜し) さ, (考え) 方
181#名詞-接尟-特殊
182#
183# noun-suffix-conjunctive: Nouns that behave like conjunctions and join two words
184# together.
185# e.g. (日本) 察 (アメリカ), 察 (アメリカ), (3) 察 (5), (女優) å…
186Œ (䞻婊)
187#名詞-接続詞的
188#
189# noun-verbal_aux: Nouns that attach to the conjunctive particle お ("te") and are
190# semantically verb-like.
191# e.g. ごらん, ご芧, 埡芧, 頂戎
192#名詞-動詞非自立的
193#
194# noun-quotation: text that cannot be segmented into words, proverbs, Chinese poetry,
195# dialects, English, etc. Currently, the only entry for 名詞 匕甚文字列 ("noun quotation")
196# is いわく ("iwaku").
197#名詞-匕甚文字列
198#
199# noun-nai_adjective: Words that appear before the auxiliary verb ない ("nai") and
200# behave like an adjective.
201# e.g. 申し蚳, 仕方, ずんでも, 違い
202#名詞-ナむ圢容詞語幹
203#
204#####
205# prefix: unclassified prefixes
206#接頭詞
207#
208# prefix-nominal: Prefixes that attach to nouns (including adjective stem forms)
209# excluding numerical expressions.
210# e.g. お (æ°Ž), 某 (氏), 同 (瀟), 故
211 (氏), 高 (品質), お (芋事), ご (立掟)
212#接頭詞-名詞接続
213#
214# prefix-verbal: Prefixes that attach to the imperative form of a verb or a verb
215# in conjunctive form followed by なる/なさる/くださる.
216# e.g. お (読みなさい), お (座り)
217#接頭詞-動詞接続
218#
219# prefix-adjectival: Prefixes that attach to adjectives.
220# e.g. お (寒いですねえ), バカ (でかい)
221#接頭詞-圢容詞接続
222#
223# prefix-numerical: Prefixes that attach to numerical expressions.
224# e.g. 箄, およそ, 毎時
225#接頭詞-数接続
226#
227#####
228# verb: unclassified verbs
229#動詞
230#
231# verb-main:
232#動詞-自立
233#
234# verb-auxiliary:
235#動詞-非自立
236#
237# verb-suffix:
238#動詞-接尟
239#
240#####
241# adjective: unclassified adjectives
242#圢容詞
243#
244# adjective-main:
245#圢容詞-自立
246#
247# adjective-auxiliary:
248#圢容詞-非自立
249#
250# adjective-suffix:
251#圢容詞-接尟
252#
253#####
254# adverb: unclassified adverbs
255#副詞
256#
257# adverb-misc: Words that can be segmented into one unit and where adnominal
258# modification is not possible.
259# e.g. あいかわらず, 倚分
260#副詞-䞀般
261#
262# adverb-particle_conjunction: Adverbs that can be followed by の, は, に,
263# な, する, だ, etc.
264# e.g. こんなに, そんなに, あんなに, なにか, なんでも
265#副詞-助詞類接続
266#
267#####
268# adnominal: Words that only have noun-modifying forms.
269# e.g. この, その, あの, どの, いわゆる, なんらかの, 䜕らかの, いろんな, こういう, そういう, ああいう,
270# どういう, こんな, そんな, あんな, どんな, 倧きな, 小さな, おかしな, ほんの, たいした,
271# 「(, も) さる (こずながら)」, 埮々
272たる, 堂々
273たる, 単なる, いかなる, 我が」「同じ, 亡き
274#連䜓詞
275#
276#####
277# conjunction: Conjunctions that can occur independently.
278# e.g. が, けれども, そしお, じゃあ, それどころか
279接続詞
280#
281#####
282# particle: unclassified particles.
283助詞
284#
285# particle-case: case particles where the subclassification is undefined.
286助詞-栌助詞
287#
288# particle-case-misc: Case particles.
289# e.g. から, が, で, ず, に, ぞ, より, を, の, にお
290助詞-栌助詞-䞀般
291#
292# particle-case-quote: the "to" that appears after nouns, a person’s speech,
293# quotation marks, expressions of decisions from a meeting, reasons, judgements,
294# conjectures, etc.
295# e.g. ( だ) ず (述べた.), ( である) ず (しお執行猶予...)
296助詞-栌助詞-匕甚
297#
298# particle-case-compound: Compounds of particles and verbs that mainly behave
299# like case particles.
300# e.g. ずいう, ずいった, ずかいう, ずしお, ずずもに, ずå…
301±ã«, でもっお, にあたっお, に圓たっお, に圓っお,
302# にあたり, に圓たり, に圓り, に圓たる, にあたる, においお, に斌いお,に斌お, における, に斌ける,
303# にかけ, にかけお, にかんし, に関し, にかんしお, に関しお, にかんする, に関する, に際し,
304# に際しお, にしたがい, に埓い, に埓う, にしたがっお, に埓っお, にたいし, に察し, にたいしお,
305# に察しお, にたいする, に察する, に぀いお, に぀き, に぀け, に぀けお, に぀れ, に぀れお, にずっお,
306# にずり, にた぀わる, によっお, に䟝っお, に因っお, により, に䟝り, に因り, による, に䟝る, に因る,
307# にわたっお, にわたる, をもっお, を以っお, を通じ, を通じお, を通しお, をめぐっお, をめぐり, をめぐる,
308# っお-口語/, ちゅ
309う-関西匁「ずいう」/, (䜕) おいう (人)-口語/, っおいう-口語/, ずいふ, ずかいふ
310助詞-栌助詞-連語
311#
312# particle-conjunctive:
313# e.g. から, からには, が, けれど, けれども, けど, し, ぀぀, お, で, ず, ずころが, どころか, ずも, ども,
314# ながら, なり, ので, のに, ば, ものの, や ( した), やいなや, (ころん) じゃ(いけない)-口語/,
315# (行っ) ちゃ(いけない)-口語/, (蚀っ) たっお (しかたがない)-口語/, (それがなく)ったっお (平気)-口語/
316助詞-接続助詞
317#
318# particle-dependency:
319# e.g. こそ, さえ, しか, すら, は, も, ぞ
320助詞-係助詞
321#
322# particle-adverbial:
323# e.g. がおら, かも, くらい, 䜍, ぐらい, しも, (å­Šæ ¡) じゃ(これが流行っおいる)-口語/,
324# (それ)じゃあ (よくない)-口語/, ず぀, (私) なぞ, など, (私) なり (に), (å…
325ˆç”Ÿ) なんか (倧嫌い)-口語/,
326# (私) なんぞ, (å…
327ˆç”Ÿ) なんお (倧嫌い)-口語/, のみ, だけ, (私) だっお-口語/, だに,
328# (圌)ったら-口語/, (お茶) でも (いかが), 等 (ずう), (今埌) ずも, ばかり, ばっか-口語/, ばっかり-口語/,
329# ほど, 繋, たで, 迄, (誰) も (が)([助詞-栌助詞] および [助詞-係助詞] の前に䜍眮する「も」)
330助詞-副助詞
331#
332# particle-interjective: particles with interjective grammatical roles.
333# e.g. (束島) や
334助詞-間投助詞
335#
336# particle-coordinate:
337# e.g. ず, たり, だの, だり, ずか, なり, や, やら
338助詞-䞊立助詞
339#
340# particle-final:
341# e.g. かい, かしら, さ, ぜ, (だ)っけ-口語/, (ずたっおる) で-方蚀/, な, ナ, なあ-口語/, ぞ, ね, ネ,
342# ねぇ-口語/, ねえ-口語/, ねん-方蚀/, の, のう-口語/, や, よ, ペ, よぉ-口語/, わ, わい-口語/
343助詞-終助詞
344#
345# particle-adverbial/conjunctive/final: The particle "ka" when unknown whether it is
346# adverbial, conjunctive, or sentence final. For example:
347# (a) 「A か B か」. Ex:「(囜内
348で運甚する) か,(海倖で運甚する) か (.)」
349# (b) Inside an adverb phrase. Ex:「(幞いずいう) か (, 死者
350はいなかった.)」
351# 「(祈りが届いたせい) か (, 詊隓に合栌した.)」
352# (c) 「かのように」. Ex:「(䜕もなかった) か (のように振る舞った.)」
353# e.g. か
354助詞-副助詞䞊立助詞終助詞
355#
356# particle-adnominalizer: The "no" that attaches to nouns and modifies
357# non-inflectional words.
358助詞-連䜓化
359#
360# particle-adnominalizer: The "ni" and "to" that appear following nouns and adverbs
361# that are giongo, giseigo, or gitaigo.
362# e.g. に, ず
363助詞-副詞化
364#
365# particle-special: A particle that does not fit into one of the above classifications.
366# This includes particles that are used in Tanka, Haiku, and other poetry.
367# e.g. かな, けむ, ( しただろう) に, (あんた) にゃ(わからん), (俺) ん (家)
368助詞-特殊
369#
370#####
371# auxiliary-verb:
372助動詞
373#
374#####
375# interjection: Greetings and other exclamations.
376# e.g. おはよう, おはようございたす, こんにちは, こんばんは, ありがずう, どうもありがずう, ありがずうございたす,
377# いただきたす, ごちそうさた, さよなら, さようなら, はい, いいえ, ごめん, ごめんなさい
378#感動詞
379#
380#####
381# symbol: unclassified Symbols.
382蚘号
383#
384# symbol-misc: A general symbol not in one of the categories below.
385# e.g. [○◎@$〒→+]
386蚘号-䞀般
387#
388# symbol-comma: Commas
389# e.g. [,、]
390蚘号-読点
391#
392# symbol-period: Periods and full stops.
393# e.g. [.。]
394蚘号-句点
395#
396# symbol-space: Full-width whitespace.
397蚘号-空癜
398#
399# symbol-open_bracket:
400# e.g. [({‘“『【]
401蚘号-括匧開
402#
403# symbol-close_bracket:
404# e.g. [)}’”』」】]
405蚘号-括匧閉
406#
407# symbol-alphabetic:
408#蚘号-アルファベット
409#
410#####
411# other: unclassified other
412#その他
413#
414# other-interjection: Words that are hard to classify as noun-suffixes or
415# sentence-final particles.
416# e.g. (だ)ァ
417その他-間投
418#
419#####
420# filler: Aizuchi that occurs during a conversation or sounds inserted as filler.
421# e.g. あの, うんず, えず
422フィラヌ
423#
424#####
425# non-verbal: non-verbal sound.
426非蚀語音
427#
428#####
429# fragment:
430#語断片
431#
432#####
433# unknown: unknown part of speech.
434#未知語
435#
436##### End of file
Note: See TracBrowser for help on using the repository browser.