bgwriter.c 的代码中有如下部分:
pqsignal(SIGQUIT, bg_quickdie); /* hard crash time */
[作者:技术者高健@博客园 mail: ]
还有:
/* * bg_quickdie() occurs when signalled SIGQUIT by the postmaster. * * Some backend has bought the farm, * so we need to stop what we're doing and exit. */static voidbg_quickdie(SIGNAL_ARGS){ PG_SETMASK(&BlockSig); /* * We DO NOT want to run proc_exit() callbacks -- we're here because * shared memory may be corrupted, so we don't want to try to clean up our * transaction. Just nail the windows shut and get out of town. Now that * there's an atexit callback to prevent third-party code from breaking * things by calling exit() directly, we have to reset the callbacks * explicitly to make this work as intended. */ on_exit_reset(); /* * Note we do exit(2) not exit(0). This is to force the postmaster into a * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random * backend. This is necessary precisely because we don't clean up our * shared memory state. (The "dead man switch" mechanism in pmsignal.c * should ensure the postmaster sees this as a crash, too, but no harm in * being doubly sure.) */ exit(2);}
我是这样实验的:
首先个给 bgwriter.c 的 pg_quickdie,加入一小段代码,变成:
/* * bg_quickdie() occurs when signalled SIGQUIT by the postmaster. * * Some backend has bought the farm, * so we need to stop what we're doing and exit. */static voidbg_quickdie(SIGNAL_ARGS){ fprintf(stderr,"bg_quickdie happend.\n"); PG_SETMASK(&BlockSig); /* * We DO NOT want to run proc_exit() callbacks -- we're here because * shared memory may be corrupted, so we don't want to try to clean up our * transaction. Just nail the windows shut and get out of town. Now that * there's an atexit callback to prevent third-party code from breaking * things by calling exit() directly, we have to reset the callbacks * explicitly to make this work as intended. */ on_exit_reset(); /* * Note we do exit(2) not exit(0). This is to force the postmaster into a * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random * backend. This is necessary precisely because we don't clean up our * shared memory state. (The "dead man switch" mechanism in pmsignal.c * should ensure the postmaster sees this as a crash, too, but no harm in * being doubly sure.) */ exit(2);}
然后,我启动 postgreSQL ,并查看进程状态:
[postgres@localhost bin]$ ./postgres -D /usr/local/pgsql/dataLOG: database system was shut down at 2012-10-31 10:25:11 CSTLOG: autovacuum launcher startedLOG: database system is ready to accept connections
[root@localhost postgresql-9.2.0]# ps -ef|grep postroot 2928 2897 0 10:34 pts/1 00:00:00 su - postgrespostgres 2929 2928 0 10:34 pts/1 00:00:00 -bashpostgres 2967 2929 0 10:34 pts/1 00:00:00 ./postgres -D /usr/local/pgsql/datapostgres 2969 2967 0 10:34 ? 00:00:00 postgres: checkpointer process postgres 2970 2967 0 10:34 ? 00:00:00 postgres: writer process postgres 2971 2967 0 10:34 ? 00:00:00 postgres: wal writer process postgres 2972 2967 0 10:34 ? 00:00:00 postgres: autovacuum launcher process postgres 2973 2967 0 10:34 ? 00:00:00 postgres: stats collector process root 3000 2977 0 10:35 pts/2 00:00:00 grep post[root@localhost postgresql-9.2.0]#
然后,向 bgwriter 发送 SIGQUIT 信号:
[root@localhost postgresql-9.2.0]# kill -s SIGQUIT 2970
这个时候,我们会从pts/1 中看到什么?
bg_quickdie happend.LOG: background writer process (PID 2970) exited with exit code 2LOG: terminating any other active server processesWARNING: terminating connection because of crash of another server processDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT: In a moment you should be able to reconnect to the database and repeat your command.LOG: all server processes terminated; reinitializingLOG: database system was interrupted; last known up at 2012-10-31 10:34:47 CSTLOG: database system was not properly shut down; automatic recovery in progressLOG: record with zero length at 0/192D458LOG: redo is not requiredLOG: autovacuum launcher startedLOG: database system is ready to accept connections
也就是说,bgwriter 捕获了SIGQUIT 的信号,而Postmaster/postgres 重新启动了各个子进程!
再看 ps 来验证一下:
[root@localhost postgresql-9.2.0]# ps -ef|grep postroot 2928 2897 0 10:34 pts/1 00:00:00 su - postgrespostgres 2929 2928 0 10:34 pts/1 00:00:00 -bashpostgres 2967 2929 0 10:34 pts/1 00:00:00 ./postgres -D /usr/local/pgsql/datapostgres 3002 2967 0 10:35 ? 00:00:00 postgres: checkpointer process postgres 3003 2967 0 10:35 ? 00:00:00 postgres: writer process postgres 3004 2967 0 10:35 ? 00:00:00 postgres: wal writer process postgres 3005 2967 0 10:35 ? 00:00:00 postgres: autovacuum launcher process postgres 3006 2967 0 10:35 ? 00:00:00 postgres: stats collector process root 3010 2977 0 10:36 pts/2 00:00:00 grep post[root@localhost postgresql-9.2.0]#
[作者:技术者高健@博客园 mail: ]
结束