				Modified at: Sat Jun 25 16:15:36 JST 1994

                           Hironobu TAKAHASHI (Tsukuba Internet Club)

                         e-mail: takahasi@tiny.or.jp (From domestic)
                                 hironobu@rwcp.or.jp (From foreign country)

* What is KAKASI ?

  KAKASI is the language processing filter to convert Kanji characters
to Hiragana, Katakana or Romaji(1) and may be helpful to read Japanese
documents.

  The name "KAKASI" is the abbreviation of "kanji kana simple
inverter" and the inverse of SKK "simple kana kanji converter" which
is developed by Masahiko Sato at Tohoku University. The most entries
of the kakasi dictionary is derived form the SKK dictionaries.  If you
have some interests in the naming of "KAKASI", please consult to
Japanese-English dictionary. :-)

  The kit of KAKASI includes C programs and Several kind of
dictionaries. The large dictionary "kakasidict" may be provided as
another file "kakasidict.YYMMDD.gz" since it is often revised and
really LARGE.

 (1) "Romaji" is alphabetical description of Japanese pronunciation.


* How to install KAKASI ?

1. Extract the files from the archive of kakasi using gzip and tar
   such as the following commands:

   % gzip -dc kakasi-2.2.5.tar.gz | tar xf -

   If the "kakasidict" is not included in the same file, you may find
   it named as "kakasidict.YYMMDD.gz" at the same place. uncompress
   and rename it.

   % gzip -d kakasidict.940620.gz
   % mv kakasidict.940620 kakasidict

2. Edit Makefile to fit to your environment. Variables are :

   PREFIX:  Top directory of program and library placed.
   CC:      C Compiler ( cc, gcc, acc... )
   OPTIONS: C Compiler options such as -O -O2 ...

3. Do make.

   % make

4. If the installed directories [ $(PREFIX)/bin $(PREFIX)/lib/kakasi ]
   do not exit, you must create them.

5. Do install.

   # make install

* How to use KAKASI ?

   KAKASI acts as a simple filter.

   % kakasi < input file [options] [dict1 [dict2 ...]]  > output file

   If you invoke kakasi with no options, kakasi looks to do nothing.

   Options are categorized as follows.

   a. Help !

      "kakasi -h" shows all of the available options.

   b. Character set conversion

      Some character sets are categorized by kakasi and indicated by
      following mnemonics: a, j, g, k, E, H, K, J.

      a --- ASCII characters
      j --- JIS ROMAN ( nearly equal to ASCII, "~" and "\" are
            different ) defined by JIS x0201
      g --- DEC Graphic Characters
      k --- KATAKANA defined by JIS x0201

      E, H, K, and J are included in JIS x0208 character set.

      J --- KANJI characters of JIS x0208.
      H --- HIRAGANA characters of JIS x0208.
      K --- KATAKANA characters of JIS x0208.
      E --- Rest of above characters of JIS x0208 which includes
            alphabets, numbers, symbols and so on.

      -(from)(to) means conversion from character set (from) to (to).
      For example, -JK option causes KANJI characters are converted
      to HIRAGANA. Combinations in the following table are available.
      (You must not remember it, because the -h shows same information)

      to\from|    a    j    k    E    H     K    J    g
      -------+--------------------------------------------
         a   |    -    o    o1   o    o1    o1   o12  o
         j   |    o    -    o1   o    o1    o1   o12  o
         k   |              -         o     o    o2
         E   |    o    o         -                    o
         H   |              o         -     o    o2
         K   |              o         o     -

      o  -- converted.
      1  -- converted to Romaji.
      2  -- Kanji -> Kana conversion.

   c. Kanji coding conversion.

      Unfortunately, several coding systems are used in Japan and JIS
      x0208 standard are changed at 1983. Therefore, KAKASI can
      automatically distinguish the coding system and coding revision
      and then use the same output coding system if the document does
      not include JIS x0201 KATAKANA.  If JIS x0201 KATAKANA is
      included or you wish to change kanji coding system, you may
      use the next options.

      -i : input coding
      -o : output coding

      jis -- Widely used on the internet. (Ex: fj, jp, .. newsgroups)
             Derived from ISO-2022 coding manner.
             newjis: JISx0208 (1983) invoked by ESC-$-B.
             oldjis: JISx0208 (1978) invoked by ESC-$-@.
      euc,dec -- Often used in UNIX like computers. JISx0208 is
             assigned to GR ( MSB is 1 ). The major difference between
             euc and dec is assignment of JISx0201 KATAKANA and
             the DEC graphic character.
      sjis -- Defined by Microsoft Corp. Widely used on the personal
             computers ( MSDOS, Mac, .. )

   d. Romaji conversion.

      There are 2 types of romaji writing.  The first is the Kunrei
      method defined by Japanese government, and the second is the
      Hepburn method.  I think Hepburn method sounds naturally to
      foreigners.

      -rhepburn : Hepburn Method (default)
      -rkunrei  : Kunrei Method

   e. Kanji kana conversion options. Used with -J? option.

      -p: List all possible readings. If there exist two or more
          possible readings, KAKASI shows them in braces {aaa,bbb}.
      -s: Insert a separate character between words.
      -f: Furigana mode. Shows the original kanji word with reading.
      -c: Skip characters within word. ( default TAB CR LF BLANK )
      -C: Capitalize romaji word (with -Ja or -Jj option)
      -U: Upcase romaji word (with -Ja or -Jj option)
      -u: Call fflush().

* Examples

  a. All kanji characters are converted to hiragana.

	kakasi -JH < input_file > output_file

  b. All JIS x0208 characters are converted to jis x0201.

	kakasi -Hk -Kk -Jk -Ej < input_file > output_file

  c. All characters are converted to ascii and words are separated.

	kakasi -Ha -Ka -Ja -Ea -ka -s < input_file > output_file

  d. Read news without kanji.

	rn | kakasi -JH -c'>'

