Computers have been used to solve problem automatically through complex programs to assist a user. Computational linguistics, also known as natural language processing (NLP), is a field that specializes in computer science and linguistics that deals with the analysis and processing of human languages using computers. NLP has many applications, which includes Automatic Summarization, Machine Translation, Part-of-Speech Tagging (POS), Speech Recognition (ASR), Optical Character Recognition (OCR), and Information Retrieval (IR). Spell checking is yet another significant application of computational linguistics whose research extends back to the early seventies when Ralph ...view middle of the document...
The more words a spell checker has, the higher the error detection rate. Without a large dictionary, the spell checker would pass errors undetected. However, the dictionary the spell checker is using should not be extremely large because some words would be mistaken for other words. A spell checker is a computer program that spots and often corrects misspelled words in a text document . It can be a standalone application or an add-on module combined into an existing program such as a word processor or search engine. Basically, a spell checker works off of three component: An error detector that detects misspelled words, a candidate spellings generator that provides spelling suggestions for the detected errors, and an error corrector that chooses the best correction out of the list of candidate spellings. All these three basic components are usually connected to a core dictionary of words that the program uses to validate words present in the text to be spell checked. The most basic algorithm used for spelling correction can come from this template :
if (isMistake( w ))
Candidates = getCandidates( w )
Suggestions = filterAndRank( Candidates )
else return IS CORRECT.
First, spell checking runs before providing spelling corrections or suggestions. This happens to avoid making suggestions for already correct words. Then candidate words are generated. A candidate is a word that is most likely a correction for the detected error. This process results in hundreds even thousands of candidate words. This is why candidate words are ranked according to an internal algorithm that assigns a score to every candidate. The highest scoring candidates are considered to be real spelling suggestions. The purpose of spell correctors is to find and correct spelling errors in typewritten text. Around 80% of all misspelled words has one error letter, due to either transposition of two letters, adding extra letter, omitting one letter, or mistyping one letter . This assumption makes the correction word one character different from the misspelled word. The problem of real word error should be noted. This happens when the spelling of a word can be found in the dictionary but it is not the correct word to use at the location. For example, the sentence "It as raining today.” the word "as" should be "is". This situation can happen when the dictionary is too large and has a lot of rarely used words. For the language of English, it is found that this happens when the dictionary used contains more than 90,000 words. Problems like this are related to the context where the word is used. Thus algorithms that analyze the grammatical structure of the text is needed. This is closely related to natural language understanding and processing.
The analysis of n-gram is a method to find...