This study employs a mathematical modeling approach.
General
In the mathematical modeling process, we draw on the research of adan_queueing2002 regarding queuing theory, specifically focusing on student arrival distribution between the third and fourth class sessions, the service time distribution at service windows, and the observed number of windows. For 10 Days/2weeks.
Modeling
1. Data Analysis
Arrival data analysis
Plot the collected data (a person entering the cafeteria at a certain time) as a histogram
Attempt to fit common distributions (Normal, Poisson, Exponential, Lognormal, Gamma, etc.) to determine the specific distributional form of the arrival and service process.
The collected data were entered into a computer and analyzed for summary statistics and visual representations across different variables using a statistical system. The frequency distribution of the survey results is shown in Figure XX.
We reviewed the observed data and removed certain outliers. or:
Enhanced outlier handling methods
Direct deletion of outliers may result in loss of information and is recommended:
Use statistical methods (e.g., interquartile distance for box-and-line plots) to identify outliers and analyze their causes.
If the outliers have special significance (e.g., congestion due to sudden activity), they can be modeled or recorded separately.
General
Main
1. Input data:
- Enter data (Each value represents a time that a student arrive the cafeteria)
2. Distribution fitting:
- Use the distributions (normal, Poisson, exponential, lognormal, gamma) from the scipy.stats module.
- Fit each distribution and estimate its parameters.
3. Model selection criteria:
- AIC (Akaike Information Criterion): a measure of model fit and smoothness.
- BIC (Bayesian Information Criterion): considers the effect of the number of data points on model fit.
4. Output results:
- Name of the fitted distribution, parameters, log-likelihood values, AIC and BIC.
- The optimal distribution is selected by the smaller AIC/BIC value.
Extra
Addition of distributional tests:
Kolmogorov-Smirnov (K-S) test: checks that the data fits the fitted distribution.
Anderson-Darling test: more sensitive to fitting the tails of the distribution.
Multi-distribution combination test:
If a single distribution is not fitted well, try a Mixed Distribution Model (e.g. Mixed Gaussian) to capture complex patterns.
The probability density function (PDF) of the Gaussian Mixture Model is:
p(x)=k=1∑KπkN(x∣μk,σk2)
Where:
K is the number of Gaussian distributions (i.e., the number of mixture components).
πk is the weight of the k-th Gaussian distribution, satisfying ∑k=1Kπk=1.
N(x∣μk,σk2) represents the k-th Gaussian distribution, with mean μk and variance σk2.
import numpy as npimport scipy.stats as statsimport pandas as pdarrival_times = list(map(float, input("Please Enter the values(Split by Space):").split()))# This part can be modified after the data is collecteddistributions = { "Normal": stats.norm, "Poisson": stats.poisson, "Exponential": stats.expon, "Lognormal": stats.lognorm, "Gamma": stats.gamma}results = []for name, distribution in distributions.items(): try: params = distribution.fit(arrival_times) pdf = distribution.pdf(arrival_times, *params) log_likelihood = np.sum(np.log(pdf)) # AIC k = len(params) aic = 2 * k - 2 * log_likelihood # BIC n = len(arrival_times) bic = k * np.log(n) - 2 * log_likelihood results.append({ "Distribution": name, "Parameters": params, "Log-Likelihood": log_likelihood, "AIC": aic, "BIC": bic }) except Exception as e: results.append({ "Distribution": name, "Error": str(e) })# DataFrameresults_df = pd.DataFrame(results)import ace_tools as tools; tools.display_dataframe_to_user(name="Distribution Fit Results", dataframe=results_df)
2. Model Establishment
Based on assumptions about the data and usual observations, we assume that the arrival distribution satisfies a certain distribution (e.g., skewed distribution). Most students usually enter the cafeteria right after classes end, and the arrival rate of people decreases steadly over time. We assume that the service process is a poisson process, i.e., each service time is completely random (not depend on any previous situation).
So the G/M/c model should be used. (Note: the actual model selection needs to be based on actual data)
In the G/M/c model, the arrival process follows a General distribution, the service time follows a Poisson process, and there are c parallel service windows.
Basic Parameters
Symbol
Explanation
Unit
λ
Average arrival rate of customers
people/s
μ
Average service rate of a single service window
services/s
c
Number of service windows
Number
ρ
system utilization rate
/
Ca
Coefficient of variation
/
Wq
Mean Waiting Time
s
Lq
Average number of people in the queue
Number
P0
Probability that the system is empty
/
Which:
ρ=cμλ
where ρ<1 is necessary for system stability.
Ca: coefficient of variation (ratio of standard deviation to mean) of the arrival time distribution:
Ca=1/λσa
Standard deviation of arrival times σa
We have a Set of data [Ta1,Ta2,…,Tan],We can estimate σa by:
σa=n−11i=1∑n(Tai−Taˉ)2
Taˉ=n1∑i=1nTai is the average of arrival times
Mean Waiting Time Wq
Using a generalized form of the Pollaczek-Khinchin (P-K) formula, you can compute the average wait time in the queue Wq:
Wq=2Ca2+1⋅Wq,M/M/c
where Wq,M/M/c is the average waiting time of the M/M/c model, Eq:
Wq,M/M/c=λLq,M/M/c
Average number of people in the queue Lq
And Lq,M/M/c (the average number of people in the queue) is based on the Erlang-C formula:
P0: the probability that the system is empty, which can be calculated by the following formula:
P0=(k=0∑c−1k!(λ/μ)k+c!(λ/μ)c⋅1−ρ1)−1
Average system wait time W
The average wait time in the system W includes the wait time in the queue Wq and the service time:
W=Wq+μ1
Average number of people in the system L
The average number of people in the system L includes the number of people in the queue Lq and the number of people being served:
L=Lq+μλ
Extra
Dynamic modeling: student arrival rates and service times are currently assumed to be fixed distributions, but may vary in real scenarios (e.g., peak hour fluctuations). It is possible to:
Introduce time-varying parameter models (e.g., time segmentation, smoothing functions).
Use Markov chain models to simulate dynamic service window states.
Explore multi-model comparisons:
In addition to G/M/c, try G/G/c or more complex models (e.g., queuing networks) to analyze the overall service system.
Compare model performance and choose the model that best fits the scenario.
3. Model Using
import mathdef calculate_queue_metrics(mu, lambd, c, sigma_a=None): # Step 1: Calculate rho rho = lambd / (c * mu) if rho >= 1: raise ValueError("The system is unstable (rho >= 1). Please adjust input parameters.") # Step 2: Calculate C_a C_a = (sigma_a * lambd) if sigma_a else 1 # Default to 1 if sigma_a is not provided # Step 3: Calculate P0 sum_k = sum((lambd / mu)**k / math.factorial(k) for k in range(c)) P0 = 1 / (sum_k + (lambd / mu)**c / math.factorial(c) / (1 - rho)) # Step 4: Calculate L_q (classical M/M/c) L_q_MM_c = (P0 * ((lambd / mu)**c / math.factorial(c)) * (c * rho / (1 - rho)**2)) / ( sum_k + (lambd / mu)**c / math.factorial(c) / (1 - rho) ) # Step 5: Adjust L_q for generalized case L_q = (C_a**2 + 1) / 2 * L_q_MM_c # Step 6: Calculate W_q W_q = L_q / lambd # Step 7: Calculate W W = W_q + 1 / mu # Step 8: Calculate L L = L_q + lambd / mu return {"W": W, "L": L, "P0": P0}# Example usagemu = 5 # Service ratelambd = 4 # Arrival ratec = 3 # Number of serverssigma_a = 0.1 # Optional, variability of arrival timeresult = calculate_queue_metrics(mu, lambd, c, sigma_a)print(result)
FootNote
Use Ca (Coefficient of Variation of Arrival Time) to correct the formula of M/M/c, that is, we can get the result of G/M/c.
Outputs
We enter the observed data into the code segment and conclude that
The average arrival time is…
The average number of people in the queue is…
Extra
Enhanced parameter estimation methods: Currently, parameter estimation is mainly based on best fit, but a more rigorous calculation of confidence intervals should be incorporated.
Calculate confidence intervals for parameter estimates using the Bootstrapping method.
Check the stability of the estimates by estimating them separately for different time periods and
The data for this study came from field observations in our school’s cafeteria, and data were collected from December 2 to December 5, 2024, and December 9 to December 12, 2024 (8 days in total). Data collection under each schedule lasted for four days due to the school’s schedule of one cycle every two days. The data primarily covered the third and fourth periods which are lunch periods, and the peak hours. Specifically, it includes the number of students entering/leaving the cafeteria during each period, the service time of each window, and the number of windows.
Methodology
Data collection was performed through the cafeteria’s monitoring system during the daytime hours, which recorded the timestamp of each student’s entry into the cafeteria. The specific time period for data collection was from 11:50 to 13:35 each day. During this time period, the number of students entering the cafeteria was recorded by manual count, and the service time of each service window was recorded by simple observations. The mean value of the window service time is the average value obtained by several manual timings.
Data preprocessing
During the data cleaning process, all the collected data met the preset criteria, so no data was eliminated. During the data preprocessing stage, timestamps were converted to relative times with respect to peak hours and grouped for counting at 2-minute intervals during peak hours. For off-peak hours, group counts were performed at 5-minute intervals.