next up previous contents
Next: 5. Status Up: Portable Spell Checker Interface Previous: 3. Keeping in Touch   Contents

Subsections

4. Library Interface

4.1 Overview

The Pspell library contains two main classes and several helper classes. The two main classes are PspellConfig and PspellMaster. The PspellConfig class is used to set initial defaults and to change spell checker specific options. The PspellManager class does most of the real work. It is responsible for managing the dictionaries, checking if a word is in the dictionary, and coming up with suggestions among other things. There are many helper classes the important ones are PspellWordList, PspellMutableWordList, Pspell*Emulation. The PspellWordList classes is used for accessing the suggestion list, as well as the personal and suggestion word list currently in use. The PspellMutableWordList is used to manage the personal, and perhaps other, word lists. The Pspell*Emulation classes are used for iterating through a list.

A C and C++ Interface is provided. I recommend using the C interface, even if your program is in C++, to avoid some of the nasty issues associated with C++ linkage. In general one can only use C++ linkage if both the library and the program were created with the same compiler. I may eventually provide C++ wrapper classes, including a few STL like one, for the C library and remove the existing C++ interface all together.

The mapping between the C and C++ interface is pretty straightforward and from C++ to C goes as follows:

<class name in lowercase with underscores>_<method name>([const] <Class> *, <other parameters if any>)
For example ``PspellManager::lang_name() const'' would become ``pspell_manager_lang_name(const PspellManager *)''.

Methods that return a bool will instead return an int in the C interface.

4.2 Usage

To use pspell your application should include ``pspell/pspell.h''. In order to insure that all the necessary libraries are linked in libtool should be used to perform the linking. When using libtool simply linking with ``-lpspell'' should be all that is necessary. When using shared libraries you might be able to simply link ``-lpspell'', but this is not recommended. This version of Pspell uses the CVS version of libtool (multi-language-branch) however released versions of libtool should also work.

When your application first starts you should get a new configuration class with the command:

PspellConfig * spell_config = new_pspell_config();
which will create a new PspellConfig class. It is allocated with new and it is your responsibility to delete it with delete_pspell_config. The standard C++ delete can be used if the compiler is compatible with the one used to create the Pspell library. Once you have the config class you should set some variables. The most important one is the language variable. To do so use the command:

pspell_config_replace(spell_config, "language-tag", "en_US");
which will set the default language to use to American English. The language is expected to be the standard two letter ISO 639 language code, with an optional two letter ISO 3166 country code after an underscore. You can set the preferred spelling via the ``spelling'' option, any extra info via the ``jargon'' option, and the encoding via the ``encoding'' option. Other things you might want to set is the preferred spell checker to use, the search path for dictionary's, and the like see section 4.4 for the available options.

When ever a new document is created a new PspellManager class should also be created. There should be one manager class per document. To create a new manager class use the new_pspell_manager and then cast it up using to_pspell_manager like so.

PspellCanHaveError * possible_err = new_pspell_manager(spell_config);
PspellManager * spell_checker = 0;
if (pspell_error_number(possible_err) != 0)
  puts(pspell_error_message(possible_err));
else
  spell_checker = to_pspell_manager(possible_err);
which will create a new PspellManager class using the defaults found in spell_config. If C++ is being used AND the compiler is compatible with the one used to create the Pspell library a normal cast can be used instead of to_pspell_manager.

If for some reason you want to use different defaults simply clone spell_config and change the setting like so:

PspellConfig * spell_config2 = pspell_config_clone(spell_config);
pspell_config_replace(spell_config2, "language-tag","nl");
possible_err = new_pspell_manager(spell_config2);
delete_pspell_config(spell_config2);
Once again in C++ delete_pspell_config can be replaced with a simple C++ delete. Once the manager class is created you can use the check method to see if a word in the document is correct like so:

int correct = pspell_manager_check(spell_checker, <word>, <size>);
<word> can is expected to a const char * character string. If the encoding is set to be ``machine unsigned 16'' or ``machine unsigned 32''. <word> is expected to be a cast from either const u16int * or const u32int* respectfully. U16int and u32int are generally unsigned short and unsigned int respectfully. <size> is the length of the string or -1 if the sting is null terminated. If the string is a cast from const u16int * or const u32int * then size is the amount of space in bytes the string takes up after being casted to const char * and not the true size of the string. Pspell_manager_check will return 0 is it is not found and non-zero otherwise.

If the word is not correct than the suggest method can be used to come up with likely replacements.

PspellWordList * suggestions = pspell_manager_suggest(spell_checker, 
                                                      <word>, <size>);
PspellStringEmulation * elements = pspell_word_list_elements(suggestions);
const char * word;
while ( (word = pspell_string_emulation_next(pspell_elements) != NULL ) {
  // add to suggestion list
}
delete_pspell_string_emulation(elements);
Notice how elements is deleted but suggestions is not. The value returned by suggestions is only valid to the next call to suggest. Once a replacement is made the store_repl method should be used to communicate the replacement pair back to the spell checker (see section 4.7.1 for why). It usage is as follows:

pspell_manager_store_repl(spell_checker, 
                          <misspelled word>, <size>,
                          <correctly spelled word>, <size>);
If the user decided to add the word to the session or personal dictionary the the word can be be added using the add_to_session or add_to_personal methods respectfully like so:

pspell_manager_add_to_session|personal(spell_checker, <word>, <size>);
It is better to let the spell checker manage these words rather than doing it your self so that the words have a change of appearing in the suggestion list.

Finally, when the document is closed the PspellManager class should be deleted like so.

delete_pspell_manager(spell_checker);
The standard C++ delete should NOT be used here because it will not unload any shared libraries pulled in my when the manager class is created.

4.3 Class Reference

Methods that return a bool generally return false on error and true other wise. To find out what went wrong use the error_number and error_message methods. Unless otherwise stated methods that return a const char * will return null on error. The charter string returned is only valid until the next method which returns a const char * is called.

All methods are virtual and abstract, thus these classes are really abstract base classes. Therefore you cannot simply store the object directly. In order to make copies of the objects use the clone and assign methods if they are provided.

For the details of the various classes please see the header files. In the future I will generate class references using some automated tool.


4.4 Available Options

The following options are available to control which word list Pspell selects.

language-tag <string>
the language code which consists of the two letter ISO 639 language code and an optional two letter ISO 3166 country code after a dash or underscore.
spelling <string>
the requested spelling for languages with more than one spelling such as English. Known values are ``american'', ``britsh'', and ``canadian''. This information is normally inferred from the language-tag option. For example the language tag ``en_GB'' will set spelling to ``british''.
jargon <string>
an extra information to distinguish two different words lists that have the same language-tag and spelling.
word-list-path <list>
search path for word list information files
module-search-order <list>
list of available modules, modules that come first on this list have a higher priority
The following options control the behavior of the selected module. Not all modules support all options.

encoding <string>
encoding that words are expected to be in. Valid values are ``utf-8'', ``iso8859-*'', ``koi8-r'', ``viscii'', ``cp1252'', ``machine unsigned 16'', ``machine unsigned 32''.
ignore <int>
ignore all words which are not at least as long as the value for this setting
personal <file>
file name of the personal word list to use. Start it with ``./'' to look for the file in the current directory rather than the home directory.
repl <file>
file name of the replacement word list to use. Start it with ``./'' to look for the file in the current directory rather than the home directory.
save-repl <boolean>
save the replacement word list on calls to save_all_word_lists.
ignore-repl <boolean>
ignore calls to Manager::store_replacement.
sug-mode <string>
the suggestion mode, known values are fast, normal, and bad-spellers
run-together <boolean>
consider run-together words as legal compounds.
The following options may be examined to tell exactly what word list or module was selected

master
the full path of the word list selected
master-flags
any special flags that were passed on to the module
module
the module selected
The options, spelling and jargon can also be examined.

<string> options may be set to anything, including in some cases an empty string. <int> options must be set to a valid integer string. <boolean> options must be set to ``true'' or ``false''. <list> options can not be set directly, you must use the option add-<option> to add an item to the list, rem-<option> to remove an item, or rem-all-<option> to remove all the items. In the case of rem-all-<option> the value should be an empty string. Although the standard retrieve method will work for a string, it should not be used as the format of the string is implementation dependent. Use the retrieve_list method instead.

4.5 Format of the PWLI Files

In order for Pspell to know which word lists to use each word list must have at least one PWLI file in the pspell data directory which is normally /usr/local/share/pspell/, use ``pspell-config pkgdatadir'' to find out what it is on your system.

Each PWLI has the the following name:

<language>[-[<spelling>][-<jargon>]]-<module>.pwli
Where <language> is the two letter language code, <spelling> is the particular spelling your interested in if the languages has multiple spelling in different parts of the world such as English, <jargon> is any extra informations to distinguish the word list from other ones with the same language and spelling, and <module> is the pspell module the main word list is for.

For example:

en-aspell.pwli
en-american-aspell.pwli
en-american-medical-ispell.pwli
en-american-xlg-ispell.pwli
de--medical-ispell.pwli
Notice how if the spelling is left out but the jargon is not there needs to be two dashes between the language and the jargon.

Each PWLI file then contains exactly one line which contains the full path of the main word list, white space, then any additional options to pass onto the module.

4.6 Examples

Two simple examples are included in the examples directory. Pspell must be installed before they will compile and at least one pspell module must be installed before they will run. To build the C example type ``make example-c'' and to build the C++ examples type ``make example-cxx''.

4.7 Rational


4.7.1 store_repl method

This method is needed because Aspell (http://aspell.sourceforge.net/) is able to learn from users misspellings. For example on the first pass a user misspells beginning as beging so aspell suggests:

begging, begin, being, Beijing, bagging, ....
However the user then tries "begning" and aspell suggests

beginning, beaning, begging, ...
so the user selects beginning. However than, latter on in the document the user misspelles it as begng (NOT beging). Normally aspell will suggest.

began, begging, begin, begun, ....
However becuase it knows the user mispelled beginning as beging it will instead suggest:

beginning, began, begging, begin, begun ...
I myself often misspelled beginning (and still do) as something close to begging and two many times wind up writing sentences such as "begging with ....".


next up previous contents
Next: 5. Status Up: Portable Spell Checker Interface Previous: 3. Keeping in Touch   Contents
Kevin Atkinson 2001-05-29