Customer Segmentation for E-commerce using K-means Clustering

Authors

Vishal Reddy Jakkareddy

PruthviRaj Amgoth

Published

April 14, 2025

Slides: slides.html ( Go to slides.qmd to edit)

1 Introduction

Customer segmentation is an essential strategy in e-commerce that allows businesses to classify customers based on purchasing behaviors, enabling targeted marketing efforts and customer retention strategies. Traditional segmentation relied on demographic attributes, but with advancements in machine learning, data-driven approaches such as K-Means clustering have become widely adopted. This study leverages K-Means clustering to analyze an online retail dataset and identify distinct customer groups based on their purchase behaviors.

Recent research has highlighted the advantages of clustering algorithms in segmenting customers effectively. K-Means clustering remains a popular choice due to its efficiency and ease of implementation (Paramita and Hariguna 2024). Alternative methods such as K-Medoids and DBSCAN have also been explored, showing advantages in handling noise and varying cluster densities (Wu et al. 2022). This study applies K-Means clustering using R to segment customers from an online retail dataset, enabling a data-driven approach to customer analysis and decision-making.

1.1 Literature Review

1.1.1 K-Means Clustering in E-Commerce

The effectiveness of K-Means clustering for customer segmentation has been demonstrated in various studies. Tabianan et al. (Tabianan, Velu, and Ravi 2022) implemented the K-Means algorithm on a Malaysian e-commerce dataset, improving clustering accuracy by incorporating SAPK. Similarly, Rajput & Singh (Rajput and Singh 2023) analyzed session length, time spent on mobile and web platforms, and yearly spending to determine optimal customer segmentation using the elbow method.

1.1.2 Optimizing K-Means Clustering

Bradley et al. (Bradley, Bennett, and Demiriz 2000) introduced a constrained K-Means algorithm, which ensures a minimum number of points per cluster, improving the stability of segmentation. Kanungo et al. (Kanungo et al. 2000) proposed a filtering algorithm that optimizes K-Means clustering by using kd-trees, enhancing efficiency in large datasets.

Devaghi & Sudha (P and Sudha 2023) emphasized the significance of exploratory data analysis (EDA) combined with K-Means clustering for customer segmentation, highlighting the need for robust data preprocessing and feature selection.

1.1.3 Real-World Applications

Several studies illustrate the practical applications of K-Means clustering in e-commerce and retail. Agrawal et al. (Agrawal, Kaur, and Singh 2023) used a hybrid approach integrating the elbow method and K-Means clustering to analyze e-commerce data, demonstrating its effectiveness in identifying customer segments for personalized marketing. Arul et al. (Arul, Kumar, and Agarwal 2021) applied K-Means to mall customer data, providing insights that enhanced targeted advertising strategies.

2 Methods

2.1 K-Means Clustering Algorithm

The K-Means clustering algorithm is an iterative approach for partitioning a dataset into k clusters. It follows these steps to achieve convergence:

Initialize Cluster Centers:
- Select k initial cluster centroids randomly or use K-Means++ (K-Means++ is an improved initialization technique that places centroids far apart to enhance convergence and accuracy.) for optimized placement.
- This ensures centroids are well-separated initially.
Assign Data Points:
- Compute the distance between each data point and all centroids.
- Assign each data point to the nearest cluster based on a proximity measure (e.g., Euclidean distance).
- The assignment is done by minimizing the distance:

\[ D(x_i, Z_k(I)) = \min D(x_i, Z_j(I)), j = 1, 2, ..., k \]

Update Cluster Centers:
- Recalculate the centroid of each cluster by computing the mean of all assigned points:

\[ Z_j(I) = \frac{1}{n_j} \sum_{k=1}^{n_j} X_k^{(j)} \]

Where:

\(X_k^{(j)}\) are the data points assigned to cluster j.
\(Z_j(I)\) are the centroids of clusters.

Repeat Until Convergence:
- Steps 2 and 3 are repeated iteratively until no significant change occurs in the centroids.
- The algorithm stops when cluster assignments remain stable or meet a convergence threshold (Tabianan, Velu, and Ravi 2022).
- The final clustering result aims to minimize the error square sum criterion:

\[ J = \sum_{j=1}^{k} \sum_{k=1}^{n_j} ||X_k^{(j)} - Z_j(I)||^2 \]

Where J represents the total clustering variance.

K-Means Clustering Algorithm (Image Source: Shiyu Gong)

2.2 Selection of `k` (Number of Clusters)

The optimal number of clusters k is determined using the Elbow Method, which plots WCSS against k values. The elbow point—where the rate of decrease in WCSS (Within-Cluster Sum of Squares (WCSS) is a measure of the variance within clusters. The elbow point—where the curve bends—indicates the optimal number of clusters.) diminishes—indicates the optimal k.

2.3 Proximity Measures

K-Means clustering assigns points to clusters based on distance metrics. The most commonly used metrics are:

Euclidean Distance (Most Common in K-Means):

\[ D_{euclidean}(x_1, x_2) = \sqrt{ \sum_{i=1}^{n} ((x_{1})_{i} - (x_{2})_{i})^2 } \]

Manhattan Distance (Useful for Grid-Based Data):

\[ D_{manhattan}(x_1, x_2) = \sum_{i=1}^{n} |((x_{1})_{i} - (x_{2})_{i})| \]

Choosing an appropriate distance metric affects cluster shape and performance (Bradley, Bennett, and Demiriz 2000).

3 Analysis and Results

3.1 Dataset Description

The Online Retail dataset, sourced from the UCI Machine Learning Repository, contains transactional data from a UK-based e-commerce store. This store specializes in selling unique giftware, which is frequently purchased in bulk by customers. The dataset spans transactions recorded between December 1, 2009, and December 9, 2011. (Chen 2015)

This dataset is particularly useful for customer segmentation, sales analysis, and market trend evaluation. It includes eight attributes that provide insights into customer purchases, product details, and order quantities. The data can be leveraged for analyzing buying behaviors, identifying customer clusters, and predicting future sales trends.

3.2 Dataset Overview

The dataset contains transactional records from an online retail store. The key attributes in the dataset include:

InvoiceNo: Unique invoice number for transactions
StockCode: Product code
Description: Product name
Quantity: Quantity purchased
InvoiceDate: Date and time of the purchase
UnitPrice: Price per unit
CustomerID: Unique identifier for customers
Country: Country where the transaction occurred

3.3 Data Preprocessing

Before clustering, the dataset requires cleaning and transformation. The preprocessing steps include handling missing values, removing duplicates, and transforming categorical data where necessary.

Code

# Data Manipulation and Cleaning
library(tidyverse)
library(readxl)

# Visualization
library(ggplot2)
library(patchwork)
library(gridExtra)
library(ggpubr)

# Data Exploration and Analysis
library(DT)
library(knitr)
library(gtsummary)
library(naniar)
library(VIM)
library(skimr)

# Clustering Analysis
library(factoextra)
library(cluster)
library(clValid)
library(NbClust)

# Load Dataset
file_path <- "~/customer_segmentation_group_project/Online_Retail.xlsx"
df <- read_excel(file_path)

# Display sample rows
kable(head(df), format = "html", caption = "First Few Records of the Dataset", table.attr = "class='table table-striped table-bordered'")

First Few Records of the Dataset
InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	2010-12-01 08:26:00	2.55	17850	United Kingdom
536365	71053	WHITE METAL LANTERN	6	2010-12-01 08:26:00	3.39	17850	United Kingdom
536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	2010-12-01 08:26:00	2.75	17850	United Kingdom
536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	2010-12-01 08:26:00	3.39	17850	United Kingdom
536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	2010-12-01 08:26:00	3.39	17850	United Kingdom
536365	22752	SET 7 BABUSHKA NESTING BOXES	2	2010-12-01 08:26:00	7.65	17850	United Kingdom

Code

# Summary Statistics
skim(df)

Data summary
Name	df
Number of rows	541909
Number of columns	8
_______________________
Column type frequency:
character	4
numeric	3
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
InvoiceNo	0	1	6	7	25900
StockCode	0	1	1	12	4070
Description	1454	1	1	35	4211
Country	0	1	3	20	38

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Quantity	0	1.00	9.55	218.08	-80995.00	1.00	3.00	10.00	80995	▁▁▇▁▁
UnitPrice	0	1.00	4.61	96.76	-11062.06	1.25	2.08	4.13	38970	▁▇▁▁▁
CustomerID	135080	0.75	15287.69	1713.60	12346.00	13953.00	15152.00	16791.00	18287	▇▇▇▇▇

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
InvoiceDate	0	1	2010-12-01 08:26:00	2011-12-09 12:50:00	2011-07-19 17:17:00	23260

The dataset has 541,909 rows and 8 columns. CustomerID has 135,080 missing values (25% missing rate), and Description has 1,454 missing values. Quantity and UnitPrice show extreme values (e.g., negative quantities, very high prices), indicating the need for cleaning.

Code

# Visualizing Missing Values
aggr(df, col = c("navyblue", "red"), numbers = TRUE, sortVars = FALSE,
     cex.axis = 0.8, cex.lab = 1.2, cex.numbers = 1.2, main = "Missing Data Visualization")

The visualization shows that CustomerID has the highest proportion of missing values (0.25), often paired with missing Description values (0.01 proportion of combinations).

3.4 Data Cleaning and Transformation

Code

# Data Cleaning
df <- df %>%
  filter(!is.na(CustomerID)) %>%
  filter(!grepl("^C", InvoiceNo)) %>%
  filter(Quantity > 0, UnitPrice > 0) %>%
  mutate(TotalSpending = Quantity * UnitPrice) %>%
  mutate(InvoiceDate = as.Date(InvoiceDate)) %>%
  mutate(Description = replace_na(Description, "Unknown"))

This reduces the dataset to valid transactions, removing 135,080 rows with missing CustomerID, cancelled transactions (starting with “C”), and entries with non-positive Quantity or UnitPrice.

3.5 Exploratory Data Analysis

EDA provides insights into the dataset’s distributions and patterns.

Code

# Quantity Histogram
quantity_plot <- df %>%
  mutate(Quantity_Bins = cut(Quantity, breaks = c(-Inf, 1, 5, 10, 20, 30, 50, Inf))) %>%
  count(Quantity_Bins) %>%
  ggplot(aes(x = Quantity_Bins, y = n, fill = n)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "blue", high = "darkblue") +
  labs(title = "A: Quantity Histogram", x = "Quantity", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Price Histogram
price_plot <- df %>%
  mutate(Price_Bins = cut(UnitPrice, breaks = c(-Inf, 1, 2, 4, 6, 8, 10, 50, Inf))) %>%
  count(Price_Bins) %>%
  ggplot(aes(x = Price_Bins, y = n, fill = n)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "blue", high = "darkblue") +
  labs(title = "B: Price Histogram", x = "Price", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Top 5 StockCode
top_stockcode <- df %>%
  count(StockCode) %>%
  arrange(desc(n)) %>%
  head(5) %>%
  ggplot(aes(x = reorder(StockCode, -n), y = n, fill = StockCode)) +
  geom_bar(stat = "identity") +
  labs(title = "C: Top 5 StockCode", x = "StockCode", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Top 5 Countries
top_countries <- df %>%
  count(Country) %>%
  arrange(desc(n)) %>%
  head(5) %>%
  ggplot(aes(x = reorder(Country, -n), y = n, fill = Country)) +
  geom_bar(stat = "identity") +
  labs(title = "D: Top 5 Country", x = "Country", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Combine Plots
(quantity_plot | price_plot) /
(top_stockcode | top_countries)

Quantity Histogram (A): Most transactions involve quantities between 1 and 5, with a right-skewed distribution.
Price Histogram (B): Most items are priced between 1 and 4 units, also right-skewed.
Top 5 StockCode (C): StockCode 85123A has the highest sales count (~2,000 transactions).
Top 5 Countries (D): The United Kingdom dominates with over 300,000 transactions, followed by EIRE, France, Germany, and Spain.

3.6 RFM Analysis and Data Preparation

RFM analysis quantifies customer behavior using three metrics:

Recency: Days since the last purchase (relative to December 10, 2011).
Frequency: Number of transactions per customer.
Monetary: Total spending per customer.

Code

# Outlier Removal Function
remove_outliers <- function(data, column) {
  Q1 <- quantile(data[[column]], 0.25)
  Q3 <- quantile(data[[column]], 0.75)
  IQR_value <- Q3 - Q1
  lower_bound <- Q1 - 1.5 * IQR_value
  upper_bound <- Q3 + 1.5 * IQR_value
  data %>% filter(data[[column]] >= lower_bound & data[[column]] <= upper_bound)
}

# Removing Outliers
df <- df %>%
  remove_outliers("Quantity") %>%
  remove_outliers("UnitPrice")

# RFM Analysis Data Preparation
customer_data <- df %>%
  group_by(CustomerID) %>%
  summarise(
    Recency = as.numeric(as.Date("2011-12-10") - max(InvoiceDate)),
    Frequency = n(),
    Monetary = sum(TotalSpending)
  )

# Log Transformation
customer_data_log <- customer_data %>%
  mutate(
    Frequency = log1p(Frequency),
    Monetary = log1p(Monetary)
  )

# Standardizing Data for K-Means Clustering
customer_data_scaled <- customer_data_log %>%
  select(-CustomerID) %>%
  scale()

# Visualizing Transformed Data
p1 <- ggplot(customer_data_log, aes(y = Recency)) + geom_boxplot(fill = "#4CAF50")
p2 <- ggplot(customer_data_log, aes(y = Frequency)) + geom_boxplot(fill = "#FFC107")
p3 <- ggplot(customer_data_log, aes(y = Monetary)) + geom_boxplot(fill = "#2196F3")

grid.arrange(p1, p2, p3, ncol = 3)

Outliers in Recency, Frequency, and Monetary are removed using the IQR method.
Log transformation (log1p) reduces skewness in the RFM metrics.
Data is scaled to ensure equal weighting of features during clustering.

The boxplots show that after transformation, the distributions are more symmetric, though some outliers remain (e.g., high Monetary values).

3.7 K-Means Clustering Analysis

3.7.1 Determining Optimal k

Code

# Elbow Method
set.seed(42)
fviz_nbclust(customer_data_scaled, kmeans, method = "wss") +
  geom_vline(xintercept = 3, linetype = 2) +
  labs(subtitle = "Elbow Method")

In this step, we are using the Elbow Method to determine the optimal number of clusters (k) for K-Means clustering. The plot displays the Total Within Sum of Squares (WSS) for different values of k, and we observe how the WSS decreases as the number of clusters increases. In our case, the elbow clearly appears at k = 3, meaning that three clusters provide the best segmentation of the data without overfitting. Choosing more than three clusters would result in only marginal improvement while increasing complexity unnecessarily.

3.7.2 Applying K-Means Clustering

Code

# K-Means Clustering
set.seed(42)
kmeans_model <- kmeans(customer_data_scaled, centers = 3, nstart = 25)

# Add Cluster Labels
customer_data_log$Cluster <- as.factor(kmeans_model$cluster)

# Visualize Clusters
fviz_cluster(kmeans_model, 
             data = customer_data_scaled, 
             geom = "point",          
             show.clust.cent = TRUE, 
             ellipse.type = "convex",
             palette = "jco") +
  labs(title = "Customer Segmentation using K-Means Clustering")

The visualization illustrates the results of a K-Means clustering algorithm applied to customer data, projected onto two principal components—Dim1 and Dim2—which together capture approximately 95.6% of the total variance (Dim1: 73.3%, Dim2: 22.3%). The plot reveals three distinct customer segments, each represented by a unique color and shape: Cluster 1 (blue circles), Cluster 2 (yellow triangles), and Cluster 3 (grey squares). The separation between clusters is generally clear, indicating well-defined groupings; however, there is a slight overlap between Clusters 2 and 3, suggesting the possibility of similar behaviors or characteristics among customers near the boundary. This dimensionality reduction has proven effective, as most of the important information in the dataset is retained in just two dimensions, allowing for clear interpretation and visual segmentation of the customer base. These insights can be valuable for targeted marketing, personalized services, or strategic business decisions.

3.7.3 Cluster Profiling

Cluster Profiling is the process of analyzing the characteristics of different customer segments after applying a clustering algorithm (like K-means). It helps us understand the behavior and value of each group, making it easier to tailor marketing strategies and business decisions. In this analysis, we’ve grouped customers into three clusters based on their Recency, Frequency, and Monetary values (RFM analysis). We then calculated the average values of these metrics for each cluster to interpret their purchasing behavior. By visualizing these summaries, we can clearly identify which clusters contain loyal customers, occasional buyers, or those at risk of churn—allowing businesses to take targeted actions accordingly.

Code

# Cluster Summary
cluster_summary <- customer_data_log %>%
  group_by(Cluster) %>%
  summarise(
    Avg_Recency = mean(Recency),
    Avg_Frequency = mean(Frequency),
    Avg_Monetary = mean(Monetary),
    Total_Customers = n()
  )

# Visualize Cluster Characteristics with Adjusted Margins
p1 <- ggplot(cluster_summary, aes(x = Cluster, y = Avg_Recency, fill = Cluster)) +
  geom_bar(stat = "identity") +
  labs(title = "Avg Recency by Cluster", y = "Avg. Recency") +
  scale_fill_manual(values = c("#4CAF50", "#FFC107", "#2196F3")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 12, margin = margin(b = 10)),
        plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

p2 <- ggplot(cluster_summary, aes(x = Cluster, y = Avg_Frequency, fill = Cluster)) +
  geom_bar(stat = "identity") +
  labs(title = "Avg Frequency by Cluster", y = "Avg. Frequency") +
  scale_fill_manual(values = c("#4CAF50", "#FFC107", "#2196F3")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 12, margin = margin(b = 10)),
        plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

p3 <- ggplot(cluster_summary, aes(x = Cluster, y = Avg_Monetary, fill = Cluster)) +
  geom_bar(stat = "identity") +
  labs(title = "Avg Monetary Value by Cluster", y = "Avg. Monetary") +
  scale_fill_manual(values = c("#4CAF50", "#FFC107", "#2196F3")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 12, margin = margin(b = 10)),
        plot.margin = margin(t = 20, r = 10, b = 10, l = 10))

# Use ggarrange for better layout control
ggarrange(p1, p2, p3, ncol = 3, common.legend = TRUE, legend = "right")

Code

# Display Cluster Summary
kable(cluster_summary, format = "html", caption = "Cluster Summary", table.attr = "class='table table-striped table-bordered'")

Cluster Summary
Cluster	Avg_Recency	Avg_Frequency	Avg_Monetary	Total_Customers
1	56.62587	2.970424	5.558872	1585
2	36.88193	4.776759	7.334640	1677
3	256.41873	2.603139	5.125849	929

3.7.4 Insights

Cluster 1 (Green): Moderate recency (~57 days), low frequency (~3 transactions), and moderate spending (~5.56 log units). These are occasional buyers.
Cluster 2 (Yellow): Low recency (~37 days), high frequency (~4.78 transactions), and high spending (~7.33 log units). These are loyal, high-value customers.
Cluster 3 (Blue): High recency (~256 days), low frequency (~2.60 transactions), and low spending (~5.13 log units). These are inactive customers at risk of churn.

3.8 Results

K-Means clustering identified three customer segments:

Occasional Buyers (Cluster 1): Customers who purchase infrequently with moderate spending.
Loyal Customers (Cluster 2): Frequent buyers with high spending, ideal for retention strategies.
Inactive Customers (Cluster 3): Customers who haven’t purchased recently, requiring re-engagement campaigns.

These segments enable targeted marketing: loyalty programs for Cluster 2, re-engagement offers for Cluster 3, and promotional deals for Cluster 1.

This study demonstrates the effectiveness of K-Means clustering for customer segmentation in e-commerce. By applying RFM analysis and K-Means clustering to the Online Retail dataset, we identified three distinct customer segments with actionable insights for marketing strategies. Future work could explore alternative clustering methods (e.g., DBSCAN) or incorporate additional features like product categories to enhance segmentation.

4 References

Agrawal, Anshika, Puneet Kaur, and Monika Singh. 2023. “Customer Segmentation Model Using k-Means Clustering on e-Commerce.” In 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), 1–6. https://doi.org/10.1109/ICSCDS56580.2023.10105070.

Arul, V., Ashutosh Kumar, and Aman Agarwal. 2021. “Segmenting Mall Customers Data to Improve Business into Higher Target Using k-Means Clustering.” In 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), 1602–4. https://doi.org/10.1109/ICAC3N53548.2021.9725630.

Bradley, Paul S, Kristin P Bennett, and Ayhan Demiriz. 2000. “Constrained k-Means Clustering.” Microsoft Research, Redmond 20 (0): 0.

Chen, Daqing. 2015. “Online Retail.” UCI Machine Learning Repository.

Kanungo, Tapas, David M. Mount, Nathan S. Netanyahu, Christine Piatko, Ruth Silverman, and Angela Y. Wu. 2000. “The Analysis of a Simple k-Means Clustering Algorithm.” In Proceedings of the Sixteenth Annual Symposium on Computational Geometry, 100–109. SCG ’00. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/336154.336189.

P, Devaghi., and S. Sudha. 2023. “Exploratory Data Analysis and Data Segmentation Using k Means Clustering.” In 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), 145–47. https://doi.org/10.1109/ICACITE57410.2023.10183143.

Paramita, Adi Suryaputra, and Taqwa Hariguna. 2024. “Comparison of k-Means and DBSCAN Algorithms for Customer Segmentation in e-Commerce.” Journal of Digital Market and Digital Currency 1 (1): 43–62.

Rajput, Lucky, and Shailendra Narayan Singh. 2023. “Customer Segmentation of e-Commerce Data Using k-Means Clustering Algorithm.” In 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 658–64. https://doi.org/10.1109/Confluence56041.2023.10048834.

Tabianan, Kayalvily, Shubashini Velu, and Vinayakumar Ravi. 2022. “K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data.” Sustainability 14 (12): 7243.

Wu, Zengyuan, Lingmin Jin, Jiali Zhao, Lizheng Jing, and Liang Chen. 2022. “Research on Segmenting e-Commerce Customer Through an Improved k-Medoids Clustering Algorithm.” Computational Intelligence and Neuroscience 2022 (1): 9930613.

1 Introduction

1.1 Literature Review

1.1.1 K-Means Clustering in E-Commerce

1.1.2 Optimizing K-Means Clustering

1.1.3 Real-World Applications

2 Methods

2.1 K-Means Clustering Algorithm

2.2 Selection of k (Number of Clusters)

2.3 Proximity Measures

3 Analysis and Results

3.1 Dataset Description

3.2 Dataset Overview

3.3 Data Preprocessing

3.4 Data Cleaning and Transformation

3.5 Exploratory Data Analysis

3.6 RFM Analysis and Data Preparation

3.7 K-Means Clustering Analysis

3.7.1 Determining Optimal k

3.7.2 Applying K-Means Clustering

3.7.3 Cluster Profiling

3.7.4 Insights

3.8 Results

4 References

2.2 Selection of `k` (Number of Clusters)