When we are trying to register a new Organisation (or update an existing one), Connect ID will perform a search for similar Organisations and return 409 code if a duplicate was found.
The following fields are compared to determine whether there is a match (italic fields are optional):
- internationalName
- internationalShortName
- officialAddress.fullAddress
- officialAddress.Town
- officialAddress.Region
- officialAddress.PostalCode
- officialAddress.Country
- localName and localOrganisationNames.Name (fields are combined)
- localShortName and localOrganisationNames.shortName (fields are combined)
The algorithm works in the following way:
- it searches within the same member association only (parentOrganisationFIFAId and country must match)
- it searches only organisations with the same nature (organisationNature must match)
- the following conditions are also applied:
- if internationalName field of the new organisation is similar after squashing* or internationalName of the searched organisation matches partially after normalization** AND
- at least 55% (e.g. five out of the nine) remaining fields (see above) are also similar***, a duplicate is returned
- else there is no match
#matching fields / #non-empty fields | Match |
4/8; 4/7; 3/6; 2/5; 2/4; 1/3; 1/2 | No |
5/8; 5/7; 4/6; 3/5; 3/4; 2/3; 2/2 | Yes |
*Squashing consists of following operations that are performed in the given order:
- text transliteration
- ICU folding
- symbols and spaces removal
Levenshtein distance is set to auto (you can find the explanation below).
Registered | Squashed (registered) | Searched (new) | Squashed (searched) | Condition evaluation |
R. AUBEL | RAUBEL | R.AUBEL | RAUBEL | true |
D.V.K. IZEGEM | DVKIZEGEM | DVK EGEM | DVKEGEM | true |
ΕΡΜΗΣ ΑΠΟΛΛΩΝ | HERMESAPOLLO | HERMES APOLLO | HERMESAPOLLO | true |
R.S.C. TEMPLEUVOIS | RSCTEMPLEUVOIS | RSC TEMPLEUVE | RSCTEMPLEUVE | false |
**Normalization consists of following operations that are performed in the given order:
- text transliteration
- ICU folding
- symbols removal
- stop words removal (full list of international name stop tokens can be found below)
Two string fields are considered similar if up to 50% of the tokens (words in the phrase) are not similar. The number computed from this percentage is rounded down, before being subtracted from the total to determine the minimum. For example:
Searched | No. of tokens | Minimum no. of similar tokens | Maximum no. of non-similar tokens |
FC Bayern Munich | 3 | 2 | 1 |
Leicester City | 2 | 1 | 1 |
Gazélec Football Club Olympique Ajaccio | 5 | 3 | 2 |
Club de Regatas Vasco da Gama | 6 | 3 | 3 |
Registered | Normalized (registered) | Searched (new) | Normalized (searched) | Condition evaluation |
R. AUBEL F.C. R. ST. F.C. ANDRIMONT R. LORCA F.C. NORD | R AUBEL R ANDRIMONT R LORCA NORD | R AUBEL FC | R AUBEL | R. AUBEL F.C. |
Seattle Sounders FC | Seattle Sounders | Seattle Sounders FC U 23 | Seattle Sounders U 23 | false |
***If there are
- <=3 fields provided: all must be similar
- >3 fields provided: 75% must be similar
to return a duplicate.
- text transliteration
- ICU folding
- symbols removal
- stop words removal (full list of address and town stop tokens can be found below)
Fields officialAddress.FullAddress and officialAddress.Town are mandatory, however, sometimes Member Associations are not able to provide this data during registration and put "n/a" or "none" as a placeholder. The algorithm tries to also match such cases:
- if searched organisation town or fullAddress is "n/a" or "none" it is considered as a match as placeholders are treated as stop tokens - search engine removes them from phrase before comparison (full of list address and a town stop tokens can be found below).
Text transliteration (Any-Latin)
This transforms transliterates characters into Latin. Changes as many symbols as possible to ASCII (example: « → '<<', © → '(C)', Æ → AE.)
ICU folding
Converts symbols to Unicode. Behaves like ASCII-folding filter on steroids.
Levenshtein distance: auto
Generates an edit distance based on the length of the term:
- 0..2 must match exactly
- 3..5 one edit allowed
- >5 two edits allowed
- "na", "none", "tbd", "po", "box", "postal", "code", "address", "street", "road"
- "club", "sc", "soccer", "league", "sa", "fc", "united", "inc", "youth", "association", "el", "st"
Search for Facilities follows exactly the same pattern.