一、产生的原因
java的NIO在linux下selector.select()时,本来如果轮询的结果为空并且不调用wakeup的方法的话,这个selector.select()应该是一直阻塞的,但是java却会打破阻塞,继续执行,导致程序无限空转,造成CPU使用率100%
这个bug只出现在linux系统下,因为linux下NIO底层使用的是epoll来实现的,而java的epoll实现存在bug,导致selector出现了这种轮询为空却唤醒的情况。windows下NIO是使用的poll来实现selector的就不存在这种bug
Netty中解决该bug的方法
1、设置一个selector.select(timeout),有一个超时时间,selector有4种情况会跳出阻塞
- 有事件发生
- wakeup
- 超时
- 空轮询bug
而前两种返回值不为0,可以跳出循环,超时有时间戳记录,所以每次空轮询,有专门 的计数器+1,如果空轮询的次数超过了512次,就认为其触发了空轮询bug。
二、解决办法:
触发bug后,netty直接重建一个selector,将原来的channel重新注册到新的selector上,将旧的 selector关掉
- private void select(boolean oldWakenUp) throws IOException {//节选
- int selectCnt = 0;
- // 计算当前时间
- long currentTimeNanos = System.nanoTime();
- for(;;)
- int selectedKeys = selector.select(timeoutMillis);
- selectCnt ++;
-
- if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
- // - Selected something,
- // - waken up by user, or
- // - the task queue has a pending task.
- // - a scheduled task is ready for processing
- break;
- }
- long time = System.nanoTime();
- if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
- // timeoutMillis elapsed without anything selected.
- // 超时
- selectCnt = 1;
- } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
- selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {//默认值512
- // The code exists in an extra method to ensure the method is not too big to inline as this
- // branch is not very likely to get hit very frequently.
- // 空轮询一次 cnt+1 如果一个周期内次数超过512,则假定发生了空轮询bug,重建selector
- selector = selectRebuildSelector(selectCnt);
- selectCnt = 1;
- break;
- }
- }
- }
-
-
-
- /**
- * Replaces the current {@link Selector} of this event loop with newly created {@link Selector}s to work
- * around the infamous epoll 100% CPU bug.
- * 新建一个selector来解决空轮询bug
- */
- public void rebuildSelector() {
- if (!inEventLoop()) {
- execute(new Runnable() {
- @Override
- public void run() {
- rebuildSelector0();
- }
- });
- return;
- }
- rebuildSelector0();
- }
-
- private void rebuildSelector0() {
- final Selector oldSelector = selector;
- final SelectorTuple newSelectorTuple;
-
- //新建一个selector
- newSelectorTuple = openSelector();
-
-
- // 将旧的selector的channel全部拿出来注册到新的selector上
- int nChannels = 0;
- for (SelectionKey key: oldSelector.keys()) {
- Object a = key.attachment();
- if (!key.isValid() || key.channel().keyFor(newSelectorTuple.unwrappedSelector) != null) {
- continue;
- }
- int interestOps = key.interestOps();
- key.cancel();
- SelectionKey newKey = key.channel().register(newSelectorTuple.unwrappedSelector, interestOps, a);
- if (a instanceof AbstractNioChannel) {
- // Update SelectionKey
- ((AbstractNioChannel) a).selectionKey = newKey;
- }
- nChannels ++;
-
- }
-
- selector = newSelectorTuple.selector;
- unwrappedSelector = newSelectorTuple.unwrappedSelector;
- // time to close the old selector as everything else is registered to the new one
- //关掉旧的selector
- oldSelector.close();
- }
1、selector.select(timeoutMillis),调用了select方法,计算一次阻塞产生的时间,并selectCnt ++。
2、获取当前时间,计算select方法的操作时间是否真的阻塞了timeoutMillis,如果是,就证明是一次正常的select(),重置selectCnt = 1;如果不是,就可能触发了JDK的空轮询BUG,然后判断selectCnt 轮询次数是否大于默认的512,是,则说明却是是一直在空轮询,然后进行rebuildSelector()。
3、rebuildSelector()方法重新打开一个Selector;然后遍历oldSelector,将所有的channel重新注册到新的Selector;然后重新赋值selector,selectCnt = 1;这时候已经规避了空轮询。
转载于:https://blog.csdn.net/djydft2831djydft/article/details/113990071