« More On Lucene | Main | Matching Strings »

08 September 2004

Stemming

The Porter stemming algorithm "is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up information retrieval systems."

Martin Porter's official home page for his algorithm lists implementations in C, Java, Perl, Python, C#, .NET, Common Lisp, Ruby, VB, PHP, Delphi and Javascript.

That page also contains a link to Snowball, his string processing language designed for creating stemming algorithms. Lucene's implementation of Snowball can be found here.


Posted by ngps at 01:10 | Comments (0) | Trackbacks (0)
Comments
There is no comment.
Trackbacks
Please send trackback to:http://sandbox.rulemaker.net/ngps/108/tbping
There is no trackback.