序
本文主要研究一下apache gossip的FailureDetector
FailureDetector
incubator-retired-gossip/gossip-base/src/main/java/org/apache/gossip/accrual/FailureDetector.java
public class FailureDetector { public static final Logger LOGGER = Logger.getLogger(FailureDetector.class); private final DescriptiveStatistics descriptiveStatistics; private final long minimumSamples; private volatile long latestHeartbeatMs = -1; private final String distribution; public FailureDetector(long minimumSamples, int windowSize, String distribution) { descriptiveStatistics = new DescriptiveStatistics(windowSize); this.minimumSamples = minimumSamples; this.distribution = distribution; } /** * Updates the statistics based on the delta between the last * heartbeat and supplied time * * @param now the time of the heartbeat in milliseconds */ public synchronized void recordHeartbeat(long now) { if (now <= latestHeartbeatMs) { return; } if (latestHeartbeatMs != -1) { descriptiveStatistics.addValue(now - latestHeartbeatMs); } latestHeartbeatMs = now; } public synchronized Double computePhiMeasure(long now) { if (latestHeartbeatMs == -1 || descriptiveStatistics.getN() < minimumSamples) { return null; } long delta = now - latestHeartbeatMs; try { double probability; if (distribution.equals("normal")) { double standardDeviation = descriptiveStatistics.getStandardDeviation(); standardDeviation = standardDeviation < 0.1 ? 0.1 : standardDeviation; probability = new NormalDistributionImpl(descriptiveStatistics.getMean(), standardDeviation).cumulativeProbability(delta); } else { probability = new ExponentialDistributionImpl(descriptiveStatistics.getMean()).cumulativeProbability(delta); } final double eps = 1e-12; if (1 - probability < eps) { probability = 1.0; } return -1.0d * Math.log10(1.0d - probability); } catch (MathException | IllegalArgumentException e) { LOGGER.debug(e); return null; } }}
- FailureDetector的构造器接收三个参数,分别是minimumSamples, windowSize, distribution
- 其中minimumSamples表示最少需要多少统计值的时候才真正计算phi值,windowSize表示统计窗口的大小,distribution表示使用哪种分布,normal表示NormalDistribution,其他表示ExponentialDistribution
- FailureDetector使用了apache commons math的DescriptiveStatistics来作为Heartbeat Interval的时间窗口统计;使用了NormalDistribution、ExponentialDistribution来完成正态分布、指数分布的累积分布probability,最后使用公式-1.0d * Math.log10(1.0d - probability)来计算phi值
小结
- 论文提出了基于phi值的Accrual Failure Detector方法
- 业界关于Failure Detector的实现大致有两种,一种是以akka为代表的按照论文基于NormalDistribution来计算;一种是以cassandra为代表的基于ExponentialDistribution来计算
- apache gossip的FailureDetector则集大成地同时支持了NormalDistribution及ExponentialDistribution两种实现方式