Component-based stemming engine for malay text / Juhari ljam

Word stemming is an important feature supported by present day indexing and search system. The idea is to improve recall by automatic handling of word ending by reducing the words to their word roots, at the time of indexing and searching. Various algorithms for stemming have been developed for the...

全面介紹

書目詳細資料
主要作者: Juhari, ljam
格式: Thesis
出版: 2003
主題:
實物特徵
總結:Word stemming is an important feature supported by present day indexing and search system. The idea is to improve recall by automatic handling of word ending by reducing the words to their word roots, at the time of indexing and searching. Various algorithms for stemming have been developed for the English and the other foreign languages, but it is still new for the Malay text. How ever most of them did not given any meaning of the development or application. This is because it cannot be reused for the other applications. These projects are studied and a new algorithm is being proposed to improve the performance of the stemming process. And the most importance of this project is to propose a new technology, which is using component based. With it, a lot of applications may derive from the component. It is because the main reason of using component base is it can be reusable. So that for those who like to build a system which is have a relationship to the IR or word stemming, not need to build it anymore for the stemming engine. The developer has just to use the component engine and get the output easily. How ever this project is proposed for a specific domain that will be covered for the generic Malay words.