data3.py

Created by juliette-1

Created on December 15, 2023

1.86 KB


'''
Quantitative distance : elle se mesure avec l’euclidean distance entre x et x’
FORMULE : 
Distance euc (x,x’) = sqrt(somme de i=1 à K (xi -x’i)^2) 
 
Categorical or qualitative characteristics : elle se mesure avec l’hamming distance 
Examples: Organic/Non-organic, Smoker/non-smoker, eye color, Vegan/Vegetarian/Meat eater. Il y’a seulement deux propositions. 
FORMULE :
Hamming distance : 
Distance ham (x1, x1’) = 0 si x1 = x1’ 
Distance ham (x1, x1’) = 1 si X1 différent de x1’ 

Many datasets contain both continuous and qualitative variables.
Aggregate distance: Sum of quantitative and qualitative distances.
FORMULE : 
Distance Euclidean + Distance Hamming (en sachant de distance hamming peut être 1 + 1 + ... + 1 en fonction du nombre de variables qualitatives existantes). 
 

Clustering 
Hierarchical clustering: creates clusters from the proximity matrix.
Multiple ways to create groups from the proximity matrix:
1. Minimum linkage clustering (also known as single-linkage)
2. Maximum linkage clustering (also known as complete linkage)

We have 9 clusters: A, B, C, D, E, F, G, H, I
Minimum distance between the 9 clusters!
We have 8 clusters: A, B, C, D’, E, G, H, I (D’ = D+F) 
Minimum distance between the 8 clusters!
We have 7 clusters: A, B, D’’, E, G, H, I (D’’= C+D’)
 On continue les iterations et on trouve à la fin 8 clusters différents

Dendrogram: Plot to visualize the clusters 
ATTENTION: L’item le plus large et étant relié à tous les autres est le dernier à merger. Au contraire celui le plus proche de l’axe de abscisses est le premier cluster. 
 
Standardization of a variable:
Xstd = 1/N S(i=1 à N) (xi- Moyenne)/SD
Attention : la moyenne et la standard déviation sont relatives à chaque colonne. 
 
Standardized data: it gives the same weight to each variable in the dataset et permet de mettre toutes les variables à la même echelle. 

'''

During your visit to our site, NumWorks needs to install "cookies" or use other technologies to collect data about you in order to:

With the exception of Cookies essential to the operation of the site, NumWorks leaves you the choice: you can accept Cookies for audience measurement by clicking on the "Accept and continue" button, or refuse these Cookies by clicking on the "Continue without accepting" button or by continuing your browsing. You can update your choice at any time by clicking on the link "Manage my cookies" at the bottom of the page. For more information, please consult our cookies policy.