source: main/trunk/model-sites-dev/hathitrust/collect/capisco-european-pacific-encounters/etc/conf/lang/userdict_ja.txt@ 31289

Last change on this file since 31289 was 31289, checked in by davidb, 7 years ago

initial setup files for collection

File size: 1.3 KB
Line 
1#
2# This is a sample user dictionary for Kuromoji (JapaneseTokenizer)
3#
4# Add entries to this file in order to override the statistical model in terms
5# of segmentation, readings and part-of-speech tags. Notice that entries do
6# not have weights since they are always used when found. This is by-design
7# in order to maximize ease-of-use.
8#
9# Entries are defined using the following CSV format:
10# <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
11#
12# Notice that a single half-width space separates tokens and readings, and
13# that the number tokens and readings must match exactly.
14#
15# Also notice that multiple entries with the same <text> is undefined.
16#
17# Whitespace only lines are ignored. Comments are not allowed on entry lines.
18#
19
20# Custom segmentation for kanji compounds
21日本経枈新聞,日本 経枈 新聞,ニホン ケむザむ シンブン,カスタム名詞
22関西囜際空枯,関西 囜際 空枯,カンサむ コクサむ クりコり,カスタム名詞
23
24# Custom segmentation for compound katakana
25トヌトバッグ,トヌト バッグ,トヌト バッグ,かずカナ名詞
26ショルダヌバッグ,ショルダヌ バッグ,ショルダヌ バッグ,かずカナ名詞
27
28# Custom reading for former sumo wrestler
29朝青韍,朝青韍,アサショりリュり,カスタム人名
Note: See TracBrowser for help on using the repository browser.