Outdated Name Recognition Tech May Compromise Marketing Efforts

By John C. Hermansen

Accurate data is a critical aspect of any insurers or agents store of information and is a key to successful marketing and customer service. The personal names of individual clients or customers, of course, are a vital element.

Most aspects of automated business practices continuously are monitored and improved, however, the science of understanding and automating the correct handling of personal names continues to rely on a technique developed in the early 1900s.

Essentially a method for building a “key” from a string of characters, the system, known as Soundex, has been proven ineffective time and again.

The Soundex algorithm ignores common personal name variations based on culture, language, spelling variations, or diminutives. The technology fails to return correct matches that exist in databases, while at the same time returning a bulk of unrelated “false positive” matches. Yet, almost every personal name handling system in use by business today still relies on the one-size-fits-all, key-based techniques pioneered by Soundex more than 80 years ago!

This leads to a lack of integrity among business transactions, reduced knowledge of the best customers and increased risk of working with customers linked to criminal or terrorist activities. Even worse, it alienates customers and the public at large when people see how poorly these systems are at distinguishing the honest citizen from the terrorist.

Moreover, problems occur when the names on watch lists are not specific enough to identify a single individual and instead raise suspicion about a completely innocent person who, coincidentally, has the same name. Worse yet, heightened fears about terrorism have made many of us who are unfamiliar with names from other cultures suspect all people who bear names that look similar to a name on a watch list.

Most Americans simply are unaware of the way that names from other cultures operate, and therefore often make mistakes when entering names into databases, searching for them or even marking them on paper forms.

This is especially true in cases where names are stored within large databases. These repositories include government, medical, educational and even commercial records kept about individuals. Problems arise when attempting to retrieve records from those databases.

How a name is stored within data records may, and often does, deviate in form from the way it is entered at the time of query. Needless to say, this can cause serious problems with personal names as they relate to databases and automatic matching and retrieval systems.

Names are more than just strings of characters. Names contain critical information such as titles, gender, marital status and more. They also contain variations, such as nicknames or alternate spellings.

Because names vary from culture to culture, it has proven extremely difficult to create a single method for automatically processing them. Hispanic names have a different structure and are used differently than Chinese names. Names not written in our alphabet, such as Russian, Thai, Korean, Arabic, or Persian names, add another level of complexity, because there are multiple “standards” for transliterating them into the roman character set used by our computer systems.

Recently, corporations have recognized their poor understanding and handling of foreign names, especially those that are Arabic and Islamic in origin. Rushing to plug the gap, they consider installing additional name-matching fixes that address only these name types. They may quickly see a backlash from singling out Arab-Americans and Islamic-Americans for scrutiny.

Such accusations of “racial profiling” by their customers will cause organizations to “dumb-down” their searches in order to maintain peace. We are presented at once with the need for cultural awareness and the fear of using it to improve our systems.

One way of enforcing fair and effective compliance is by using name-recognition tools that leverage computational linguistics, software engineering and statistics to ensure an accurate evaluation of all name types. In this way, a fair and uniform approach that uses specialized techniques for the storage, retrieval and comparison of names can be used for each different type of name.

With this approach, statistics are used to determine the probable culture of a name, the probable gender of a name, variant spellings and the recognition of matches of names within a database, considering all character, cultural and phonetic variations possible.

Because this process is controlled entirely by the determination of the cultural type of the name, each different name type is handled according to techniques specifically designed for problems that are typical for that name type. Using culturally sensitive technology reduces chances of developing stereotyped responses. It ensures fair and equitable treatment with uniform software coverage of all name types.

Using empirically-driven knowledge about names creates substantial improvements in precision for name matching and name searching with reduced false positives. The availability of this new, more sophisticated name processing technology also reduces the risk of “false positives” associated with the key-based algorithms of the past. This is the type of technology used by the U.S. Intelligence and Border Protection agencies, and is now available for commercial applications.

Poorly executed name matching will frustrate users and can result in business losses due to false positives. The crucial method for examining the names of your customers against watch lists and favored-customer lists is to do so uniformly across cultures. Then, you reduce false positives while treating all customers fairly.

John C. Hermansen is the founder and chief executive officer of Language Analysis Systems, a Herndon, Va.-based maker of name-recognition software. He can be reached at jack@las-inc.com.


Reproduced from National Underwriter Edition, July 14, 2003. Copyright 2003 by The National Underwriter Company in the serial publication. All rights reserved. Copyright in this article as an independent work may be held by the author.