mongos 自动关闭或假死情况

Question

21.13K 浏览2022/01/19分片

0

ll_lc 24 2022/01/17 0条评论

system.keys的过期时间

{
“_id” : NumberLong(“6807422601795731571”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2020-06-21T15:16:26Z”)
},
{
“_id” : NumberLong(“6807422601795731572”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2020-09-19T15:16:26Z”)
},
{
“_id” : NumberLong(“6855771186085757006”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2020-12-18T15:16:26Z”)
},
{
“_id” : NumberLong(“6889168851779452981”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2021-03-18T15:16:26Z”)
},
{
“_id” : NumberLong(“6922566517473149015”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2021-06-16T15:16:26Z”)
},
{
“_id” : NumberLong(“6945109559564304539”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2021-09-14T15:16:26Z”)
},
{
“_id” : NumberLong(“6989361848860540959”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2021-12-13T15:16:26Z”)
},
{
“_id” : NumberLong(“7022759514554236974”),
“purpose” : “HMAC”,
“expiresAt” : ISODate(“2022-03-13T15:16:26Z”)
}

config server做了step Down 后生成了新的过期时间，”expiresAt” : ISODate(“2022-06-11T15:16:26Z”)

切换，有两个mongos还是出现了异常关闭&假死的情况

其中一个mongos日志：

2022-01-17T17:48:06.861+0800 W SHARDING [signalProcessingThread] error encountered while cleaning up distributed ping entry for GPMongoDB27:3309:1642375587:-2681998808238945381 :: caused by :: InterruptedAtShutdown: interrupted at shutdown
2022-01-17T17:48:06.861+0800 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2022-01-17T17:48:06.864+0800 I CONTROL [signalProcessingThread] shutting down with code:0

异常关闭后启动mongos的日志：

2022-01-17T17:49:02.031+0800 I CONTROL [main] ***** SERVER RESTARTED *****
2022-01-17T17:49:02.035+0800 I CONTROL [main] ** WARNING: You are running this process as the root user, which is not recommended.
2022-01-17T17:49:02.035+0800 I CONTROL [main]
2022-01-17T17:49:02.084+0800 I SHARDING [mongosMain] mongos version v3.6.9
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] git version: 167861a164723168adfaaa866f310cb94010428f
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] allocator: tcmalloc
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] modules: none
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] build environment:
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] distarch: x86_64
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] target_arch: x86_64
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] db version v3.6.9
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] git version: 167861a164723168adfaaa866f310cb94010428f
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] allocator: tcmalloc
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] modules: none
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] build environment:
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] distarch: x86_64
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] target_arch: x86_64
2022-01-17T17:49:02.084+0800 I CONTROL [mongosMain] options: { config: “/mongo/mongos.conf”, net: { bindIp: “172.*.*.27,127.0.0.1”, maxIncomingConnections: 30000, port: 3309 }, processManagement: { fork: true, pidFilePath: “/var/run/mongo/mongos.pid” }, security: { clusterAuthMode: “keyFile”, keyFile: “/mongo/db/secret” }, setParameter: { enableLocalhostAuthBypass: “true” }, sharding: { configDB: “cfg_shard/172.*.*.16:3307,172.*.*.15:3307,172.*.*.27:3307,172.*.*.28:3307” }, systemLog: { destination: “file”, logAppend: true, path: “/mongo/log/mongos.log” } }
2022-01-17T17:49:02.089+0800 I NETWORK [mongosMain] Starting new replica set monitor for cfg_shard/172.*.*.15:3307,172.*.*.16:3307,172.*.*.27:3307,172.*.*.28:3307
2022-01-17T17:49:02.090+0800 I SHARDING [thread1] creating distributed lock ping thread for process GPMongoDB27:3309:1642412942:9147888126120134933 (sleeping for 30000ms)
2022-01-17T17:49:02.107+0800 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.*.*.27:3307 (1 connections now open to 172.*.*.27:3307 with a 5 second timeout)
2022-01-17T17:49:02.107+0800 I NETWORK [mongosMain] Successfully connected to 172.*.*.16:3307 (1 connections now open to 172.*.*.16:3307 with a 5 second timeout)
2022-01-17T17:49:02.108+0800 I NETWORK [replSetDistLockPinger] Successfully connected to 172.*.*.28:3307 (1 connections now open to 172.*.*.28:3307 with a 5 second timeout)
2022-01-17T17:49:02.111+0800 I NETWORK [shard registry reload] Successfully connected to 172.*.*.15:3307 (1 connections now open to 172.*.*.15:3307 with a 5 second timeout)
2022-01-17T17:49:02.111+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.16:3307
2022-01-17T17:49:02.111+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.15:3307
2022-01-17T17:49:02.111+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.16:3307
2022-01-17T17:49:02.112+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.15:3307
2022-01-17T17:49:02.116+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.15:3307, took 5ms (2 connections now open to 172.*.*.15:3307)
2022-01-17T17:49:02.116+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.16:3307, took 5ms (2 connections now open to 172.*.*.16:3307)
2022-01-17T17:49:02.117+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.16:3307, took 6ms (2 connections now open to 172.*.*.16:3307)
2022-01-17T17:49:02.117+0800 I NETWORK [shard registry reload] Starting new replica set monitor for shard02/172.*.*.15:3306,172.*.*.27:3306
2022-01-17T17:49:02.117+0800 I NETWORK [shard registry reload] Starting new replica set monitor for shard01/172.*.*.16:3306,172.*.*.28:3306
2022-01-17T17:49:02.117+0800 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2022-01-17T17:49:02.118+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.15:3307, took 7ms (2 connections now open to 172.*.*.15:3307)
2022-01-17T17:49:02.120+0800 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn’t match any lock document
2022-01-17T17:49:02.130+0800 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.*.*.27:3306 (1 connections now open to 172.*.*.27:3306 with a 5 second timeout)
2022-01-17T17:49:02.145+0800 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.*.*.15:3306 (1 connections now open to 172.*.*.15:3306 with a 5 second timeout)
2022-01-17T17:49:02.160+0800 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.*.*.28:3306 (1 connections now open to 172.*.*.28:3306 with a 5 second timeout)
2022-01-17T17:49:02.176+0800 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.*.*.16:3306 (1 connections now open to 172.*.*.16:3306 with a 5 second timeout)
2022-01-17T17:49:03.118+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.28:3307
2022-01-17T17:49:03.120+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.28:3307, took 3ms (1 connections now open to 172.*.*.28:3307)
2022-01-17T17:49:03.121+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Connecting to 172.*.*.27:3307
2022-01-17T17:49:03.124+0800 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 172.*.*.27:3307, took 3ms (1 connections now open to 172.*.*.27:3307)
2022-01-17T17:49:03.125+0800 I FTDC [mongosMain] Initializing full-time diagnostic data capture with directory ‘/mongo/log/mongos.diagnostic.data’
2022-01-17T17:49:03.126+0800 I NETWORK [mongosMain] waiting for connections on port 3309

Mr.Mongo 更改状态以发布 2022/01/19

2 答案

您正在查看2个答案中的1个，单击此处查看所有答案。

ll_lc · Answer 1 · 2022-01-18T09:03:49+00:00

0

xiaoxu 1.13K 发布 2022/01/18 10 条评论

这个是BUG,需要定期stepDown或者升级4.2.12或者4.4.3以上版本才可以。4.2以前虽然也说也影响。不用干预也可以。

ll_lc 发表新评论 2022/01/19

ll_lc 已评论 2022/01/18

已经对config server做了stepDown，生成了新的过期时间。但有2个mongos在还是出现了异常自动关闭的情况。
难道是没有获取到新的singing keys?

xiaoxu 已评论 2022/01/18

通常不会，如果mongos本身异常，只能重启了

ll_lc 已评论 2022/01/18

j今天又自动关闭了
2022-01-18T13:12:07.965+0800 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-16-0] Ending connection to host 127.*.*.120.15:3306 due to bad connection status; 0 connections to that host remain open
2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-17-0] Ending connection to host 127.*.*.120.16:3306 due to bad connection status; 1 connections to that host remain open
2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-16-0] Ending connection to host 127.*.*.120.16:3306 due to bad connection status; 1 connections to that host remain open
2022-01-18T13:12:07.970+0800 I NETWORK [listener] connection accepted from 127.*.*.119.151:36930 #20662372 (7704 connections now open)
2022-01-18T13:12:07.970+0800 I NETWORK [listener] connection accepted from 127.*.*.119.151:36932 #20662373 (7705 connections now open)
2022-01-18T13:12:09.045+0800 W SHARDING [signalProcessingThread] error encountered while cleaning up distributed ping entry for GPMongoDB27:3309:1642412942:9147888126120134933 :: caused by :: InterruptedAtShutdown: interrupted at shutdown
2022-01-18T13:12:09.045+0800 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2022-01-18T13:12:09.045+0800 I NETWORK [listener] connection accepted from 127.*.*.119.154:60852 #20662615 (7862 connections now open)
2022-01-18T13:12:09.046+0800 I NETWORK [conn20662615] received client metadata from 127.*.*.119.154:60852 conn20662615: { driver: { name: “PyMongo”, version: “3.4.0” }, os: { type: “Linux”, name: “CentOS Linux 7.4.1708 Core”, architecture: “x86_64”, version: “3.10.0-1160.el7.x86_64” }, platform: “CPython 3.6.6.final.0” }
2022-01-18T13:12:09.047+0800 I NETWORK [listener] connection accepted from 127.*.*.119.154:60854 #20662616 (7863 connections now open)
2022-01-18T13:12:09.048+0800 I NETWORK [conn20662616] received client metadata from 127.*.*.119.154:60854 conn20662616: { driver: { name: “PyMongo”, version: “3.4.0” }, os: { type: “Linux”, name: “CentOS Linux 7.4.1708 Core”, architecture: “x86_64”, version: “3.10.0-1160.el7.x86_64” }, platform: “CPython 3.6.6.final.0” }
2022-01-18T13:12:09.048+0800 I CONTROL [signalProcessingThread] shutting down with code:0

ll_lc 已评论 2022/01/18

今天mongos又自动关闭，部分关键日志
2022-01-18T13:12:07.965+0800 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-16-0] Ending connection to host 127.*.*.120.15:3306 due to bad connection status; 0 connections to that host remain open
2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-17-0] Ending connection to host 127.*.*.120.16:3306 due to bad connection status; 1 connections to that host remain open
2022-01-18T13:12:07.970+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-16-0] Ending connection to host 127.*.*.120.16:3306 due to bad connection status; 1 connections to that host remain open
2022-01-18T13:12:07.970+0800 I NETWORK [listener] connection accepted from 127.*.*.119.151:36930 #20662372 (7704 connections now open)

2022-01-18T13:12:09.045+0800 W SHARDING [signalProcessingThread] error encountered while cleaning up distributed ping entry for GPMongoDB27:3309:1642412942:9147888126120134933 :: caused by :: InterruptedAtShutdown: interrupted at shutdown
2022-01-18T13:12:09.045+0800 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2022-01-18T13:12:09.045+0800 I NETWORK [listener] connection accepted from 127.*.*.119.154:60852 #20662615 (7862 connections now open)
2022-01-18T13:12:09.046+0800 I NETWORK [conn20662615] received client metadata from 127.*.*.119.154:60852 conn20662615: { driver: { name: “PyMongo”, version: “3.4.0” }, os: { type: “Linux”, name: “CentOS Linux 7.4.1708 Core”, architecture: “x86_64”, version: “3.10.0-1160.el7.x86_64” }, platform: “CPython 3.6.6.final.0” }
2022-01-18T13:12:09.047+0800 I NETWORK [listener] connection accepted from 127.*.*.119.154:60854 #20662616 (7863 connections now open)
2022-01-18T13:12:09.048+0800 I NETWORK [conn20662616] received client metadata from 127.*.*.119.154:60854 conn20662616: { driver: { name: “PyMongo”, version: “3.4.0” }, os: { type: “Linux”, name: “CentOS Linux 7.4.1708 Core”, architecture: “x86_64”, version: “3.10.0-1160.el7.x86_64” }, platform: “CPython 3.6.6.final.0” }
2022-01-18T13:12:09.048+0800 I CONTROL [signalProcessingThread] shutting down with code:0

xiaoxu 已评论 2022/01/18

你是systemctl启动的吗？
systemctl –no-block start mongod

显示 5更多评论

mongos 自动关闭或假死情况

2 答案

回顶部

关注微信：mongoing-mongoing
复制微信号

提交工单咨询

2 答案

回顶部

关注微信：mongoing-mongoing复制微信号

提交工单咨询

关注微信：mongoing-mongoing
复制微信号