本文共 8343 字,大约阅读时间需要 27 分钟。
PostgreSQL , Greenplum , segment , utility
有些时候,我们在使用Greenplum时,你会发现明明连接数是够的,但是报错的时候segment连接满无法正确连接,因为对外的接口都是MASTER,如果发现这里问题如何排查?
00000 LOG: Failed connection to seg0 00000 STATEMENT: INSERT INTO gps_track( 58M01 LOG: (58M01) Master unable to connect to seg0 with options : FATAL: Sorry, too many clients already 58M01 STATEMENT: INSERT INTO gps_track( 00000 LOG: Failed connection to seg1 00000 STATEMENT: INSERT INTO gps_track( 58M01 LOG: (58M01) Master unable to connect to seg1 with options : FATAL: Sorry, too many clients already 58M01 STATEMENT: INSERT INTO gps_track( 00000 LOG: The previous session was reset because its gang was disconnected (session id = 318465). The new session id = 318471 00000 STATEMENT: INSERT INTO gps_track( 00000 LOG: FTS rescanned, get new component databases info. 00000 STATEMENT: INSERT INTO gps_track( 00000 LOG: Failed connection to seg0 00000 STATEMENT: INSERT INTO gps_track( 58M01 LOG: (58M01) Master unable to connect to seg0 with options : FATAL: Sorry, too many clients already 58M01 STATEMENT: INSERT INTO gps_track( 00000 LOG: Failed connection to seg1 00000 STATEMENT: INSERT INTO gps_track( 58M01 LOG: (58M01) Master unable to connect to seg1 with options : FATAL: Sorry, too many clients already 58M01 STATEMENT: INSERT INTO gps_track( 00000 LOG: The previous session was reset because its gang was disconnected (session id = 318471). The new session id = 318482 00000 STATEMENT: INSERT INTO gps_track( 00000 LOG: no primary segworker group allocated 00000 STATEMENT: INSERT INTO gps_track( XX000 ERROR: (XX000) Unexpected internal error (cdbgang.c:1636) XX000 LOCATION: allocateWriterGang, cdbgang.c:1636 XX000 STATEMENT: INSERT INTO gps_track( XX000 Traceback 0: postgres: port INSERT() [0x9099af] XX000 Traceback 1: postgres: port INSERT(elog_internalerror+0x20) [0x90cdc0] XX000 Traceback 2: postgres: port INSERT(allocateWriterGang+0x248) [0x9d6cb8] XX000 Traceback 3: postgres: port INSERT(AssignGangs+0x28b) [0x6b689b] XX000 Traceback 4: postgres: port INSERT(ExecutorStart+0xe5f) [0x69e28f] XX000 Traceback 5: postgres: port INSERT() [0x8304b7] XX000 Traceback 6: postgres: port INSERT() [0x830c54] XX000 Traceback 7: postgres: port INSERT(PortalRun+0x322) [0x8311c2] XX000 Traceback 8: postgres: port INSERT() [0x82abc1] XX000 Traceback 9: postgres: port INSERT(PostgresMain+0xbd8) [0x82d418] XX000 Traceback 10: postgres: port INSERT() [0x7d3637] XX000 Traceback 11: postgres: port INSERT(PostmasterMain+0xd2e) [0x7d514e] XX000 Traceback 12: postgres: port INSERT(main+0x193) [0x730363] XX000 Traceback 13: /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b6682187d5d] XX000 Traceback 14: postgres: port INSERT() [0x4d6389]
备库到底是什么状态?
通过utility模式连接(前提是预留了足够多的super user connection = superuser_reserved_connections),允许直接连到SEGMENT节点。
PGOPTIONS='-c gp_session_role=utility' psql -h xxx -p xxx -U xxx xxx
到segment节点一探究竟。
1、当前系统中的会话,评判标准:不能超过当前segment节点配置的max_connections 否则就会报连接不足。
select * from pg_stat_activity;
2、2PC会话,评判标准:不能超过当前segment节点配置的max_prepared_transactions 否则就会报连接不足。
postgres=# select * from pg_prepared_xacts ; transaction | gid | prepared | owner | database -------------+-----+----------+-------+---------- (0 rows)
3、segment级的锁等待状态
4、segment级日志排查。
cd $PGDATA/pg_log查看日志
5、查看日志时,注意对应session id,可以找出它在master, primary segment上的所有相关日志。
日志格式如下:
src/backend/postmaster/syslogger.c/* * syslogger_write_errordata * Write the GpErrorData to the log. */static voidsyslogger_write_errordata(PipeProtoHeader *chunkHeader, GpErrorData *errorData, bool csv){ syslogger_append_current_timestamp(true); /* username */ syslogger_write_str_with_comma(errorData->username, true, csv, true); /* databasename */ syslogger_write_str_with_comma(errorData->databasename, true, csv, true); /* Process id, thread id */ syslogger_write_int32(false, "p", chunkHeader->pid, true, true); syslogger_write_int32(false, "th", chunkHeader->thid, true, true); /* Remote host */ syslogger_write_str_with_comma(errorData->remote_host, true, csv, true); /* Remote port */ syslogger_write_str_with_comma(errorData->remote_port, true, csv, true); /* session start timestamp */ syslogger_append_timestamp(errorData->fix_fields.session_start_time, true, true); /* Transaction id */ syslogger_write_int32(false, "", errorData->fix_fields.top_trans_id, true, true); /* GPDB specific options. */ syslogger_write_int32(true, "con", errorData->fix_fields.gp_session_id, true, true); syslogger_write_int32(true, "cmd", errorData->fix_fields.gp_command_count, true, true); syslogger_write_int32(false, errorData->fix_fields.gp_is_primary == 't'? "seg" : "mir", errorData->fix_fields.gp_segment_id, true, true); syslogger_write_int32(true, "slice", errorData->fix_fields.slice_id, true, true); syslogger_write_int32(true, "dx", errorData->fix_fields.dist_trans_id, true, true); syslogger_write_int32(true, "x", errorData->fix_fields.local_trans_id, true, true); syslogger_write_int32(true, "sx", errorData->fix_fields.subtrans_id, true, true); /* error severity */ syslogger_write_str_with_comma(errorData->error_severity, true, csv, true); /* sql state code */ syslogger_write_str_with_comma(errorData->sql_state, true, csv, true); /* errmsg */ syslogger_write_str_with_comma(errorData->error_message, true, csv, true); /* errdetail */ syslogger_write_str_with_comma(errorData->error_detail, true, csv, true); /* errhint */ syslogger_write_str_with_comma(errorData->error_hint, true, csv, true); /* internal query */ syslogger_write_str_with_comma(errorData->internal_query, true, csv, true); /* internal query pos */ syslogger_write_int32(true, "", errorData->fix_fields.internal_query_pos, true, true); /* err ctxt */ syslogger_write_str_with_comma(errorData->error_context, true, csv, true); /* user query */ syslogger_write_str_with_comma(errorData->debug_query_string, true, csv, true); /* cursor pos */ syslogger_write_int32(false, "", errorData->fix_fields.error_cursor_pos, true, true); /* func name */ syslogger_write_str_with_comma(errorData->error_func_name, true, csv, true); /* file name */ syslogger_write_str_with_comma(errorData->error_filename, true, csv, true); /* line number */ syslogger_write_int32(true, "", errorData->fix_fields.error_fileline, true, true); /* stack trace */ if (errorData->stacktrace != NULL) { if (csv) { write_syslogger_file_binary("\"", 1, LOG_DESTINATION_STDERR); } syslogger_write_str(errorData->stacktrace, strlen(errorData->stacktrace), true, csv); if (csv) { write_syslogger_file_binary("\"", 1, LOG_DESTINATION_STDERR); } } /* EOL */ write_syslogger_file_binary(LOG_EOL, strlen(LOG_EOL), LOG_DESTINATION_STDERR); /* * Send alerts when needed. The alerts are sent only by the master. * If the alert is failed for whatever reason, log a message and continue. */ if (errorData->fix_fields.send_alert == 't' && Gp_entry_postmaster && Gp_role == GP_ROLE_DISPATCH) { PG_TRY(); { send_alert(errorData); } PG_CATCH(); { elog(LOG,"Failed to send alert."); } PG_END_TRY(); }}
另一种方式,我们也能够使用gp_dist_random下发指令给SEGMENT节点,(通过SQL接口连接到master节点调用SQL即可)
转载地址:http://ungum.baihongyu.com/