When we are trying to register a new Organisation (or update an existing one), Connect ID will perform a search for similar Organisations and return 409 code if a duplicate was found.

The following fields are compared to determine whether there is a match (italic fields are optional):

  • internationalName
  • internationalShortName
  • officialAddress.fullAddress
  • officialAddress.Town
  • officialAddress.Region
  • officialAddress.PostalCode
  • officialAddress.Country
  • localName and localOrganisationNames.Name (fields are combined)
  • localShortName and localOrganisationNames.shortName (fields are combined)


The algorithm works in the following way:

  • it searches within the same member association only (parentOrganisationFIFAId and country must match)
  • it searches only organisations with the same nature (organisationNature must match)
  • the following conditions are also applied:
    • if internationalName field of the new organisation is similar after squashing* or internationalName of the searched organisation matches partially after normalization**                                                                                                     AND
    • at least 55% (e.g. five out of the nine) remaining fields (see above) are also similar***, a duplicate is returned
    • else there is no match

It is important to notice that if NULL is passed as any of the fields, the search engine reevaluates how many fields should match out of non-empty fields. The following table shows the exact behaviour of the search engine in this regard:
#matching fields / #non-empty fieldsMatch
4/8; 4/7; 3/6; 2/5; 2/4; 1/3; 1/2No
5/8; 5/7; 4/6; 3/5; 3/4; 2/3; 2/2Yes


*Squashing consists of following operations that are performed in the given order:

  • text transliteration
  • ICU folding
  • symbols and spaces removal

Levenshtein distance is set to auto (you can find the explanation below).


RegisteredSquashed (registered)Searched (new)Squashed (searched)Condition evaluation
R. AUBEL RAUBELR.AUBEL RAUBELtrue
D.V.K. IZEGEMDVKIZEGEMDVK EGEMDVKEGEMtrue
ΕΡΜΗΣ ΑΠΟΛΛΩΝ
HERMESAPOLLO
HERMES APOLLO
HERMESAPOLLO
true
R.S.C. TEMPLEUVOISRSCTEMPLEUVOISRSC TEMPLEUVERSCTEMPLEUVEfalse


**Normalization consists of following operations that are performed in the given order:

  • text transliteration
  • ICU folding
  • symbols removal
  • stop words removal (full list of international name stop tokens can be found below)


Two string fields are considered similar if up to 50% of the tokens (words in the phrase) are not similar. The number computed from this percentage is rounded down, before being subtracted from the total to determine the minimum. For example:

Searched
No. of tokensMinimum no. of similar tokensMaximum no. of non-similar tokens
FC Bayern Munich321
Leicester City211
Gazélec Football Club Olympique Ajaccio532
Club de Regatas Vasco da Gama633


RegisteredNormalized (registered)Searched (new)Normalized (searched)Condition evaluation
R. AUBEL F.C.
R. ST. F.C. ANDRIMONT
R. LORCA F.C. NORD
R AUBEL
R ANDRIMONT
R LORCA NORD
 R AUBEL FCR AUBELR. AUBEL F.C.
Seattle Sounders FCSeattle SoundersSeattle Sounders FC U 23Seattle Sounders U 23false


***If there are 

  • <=3 fields provided: all must be similar
  • >3 fields provided: 75% must be similar

to return a duplicate.


Tokens are considered similar if they match exactly or fit into Levenshtein distance: auto after the following filtering:
  • text transliteration
  • ICU folding
  • symbols removal
  • stop words removal (full list of address and town stop tokens can be found below)


Fields officialAddress.FullAddress and officialAddress.Town are mandatory, however, sometimes Member Associations are not able to provide this data during registration and put "n/a" or "none" as a placeholder. The algorithm tries to also match such cases:

  • if searched organisation town or fullAddress is "n/a" or "none" it is considered as a match as placeholders are treated as stop tokens - search engine removes them from phrase before comparison (full of list address and a town stop tokens can be found below).

Text transliteration (Any-Latin)
This transforms transliterates characters into Latin. Changes as many symbols as possible to ASCII (example:  « → '<<', © → '(C)', Æ → AE.) 


ICU folding
Converts symbols to Unicode. Behaves like ASCII-folding filter on steroids. 


Levenshtein distance: auto

Generates an edit distance based on the length of the term:

  • 0..2 must match exactly
  • 3..5 one edit allowed
  • >5 two edits allowed

Address and town stop tokens:
  • "na", "none", "tbd", "po", "box", "postal", "code", "address", "street", "road"

 International name stop tokens:
  • "club", "sc", "soccer", "league", "sa", "fc", "united", "inc", "youth", "association", "el", "st"

Search for Facilities follows exactly the same pattern.