NestJS健康检查

学习目标

  • 掌握NestJS健康检查模块的使用方法
  • 理解健康指标的定义和使用场景
  • 学习如何集成健康检查到监控系统
  • 了解就绪检查和存活检查的区别
  • 掌握健康检查的最佳实践和常见问题

核心知识点

1. 健康检查简介

健康检查是监控应用程序状态的重要手段,它可以帮助我们及时发现和解决应用程序的问题。在NestJS中,健康检查功能通过@nestjs/terminus包提供。健康检查模块支持以下功能:

  • 检查应用程序的运行状态
  • 检查依赖服务的状态(如数据库、Redis等)
  • 提供健康指标的可视化
  • 集成到监控系统(如Prometheus、Grafana等)
  • 支持Kubernetes的就绪检查和存活检查

2. 安装和配置

首先,我们需要安装健康检查模块:

npm install --save @nestjs/terminus

然后,在应用的根模块中导入并配置健康检查模块:

// src/app.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { AppController } from './app.controller';
import { AppService } from './app.service';

@Module({
  imports: [TerminusModule],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

3. 基本使用

3.1 创建健康检查控制器

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(private health: HealthCheckService) {}

  @Get()
  @HealthCheck()
async check() {
    return this.health.check([]);
  }
}

3.2 添加健康检查指标

我们可以添加各种健康检查指标,如数据库连接、Redis连接、HTTP服务等。

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, HttpHealthIndicator, TypeOrmHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private http: HttpHealthIndicator,
    private db: TypeOrmHealthIndicator,
  ) {}

  @Get()
  @HealthCheck()
async check() {
    return this.health.check([
      () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
      () => this.db.pingCheck('database'),
    ]);
  }
}

4. 健康检查指标

NestJS的健康检查模块提供了以下内置的健康检查指标:

  • HttpHealthIndicator:检查HTTP服务的状态
  • TypeOrmHealthIndicator:检查TypeORM数据库连接的状态
  • MongooseHealthIndicator:检查MongoDB连接的状态
  • RedisHealthIndicator:检查Redis连接的状态
  • MicroserviceHealthIndicator:检查微服务的状态

5. 自定义健康检查指标

除了使用内置的健康检查指标外,我们还可以创建自定义的健康检查指标。

// src/health/custom.health.ts
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';

@Injectable()
export class CustomHealthIndicator extends HealthIndicator {
  async isHealthy(key: string, options: { threshold: number }): Promise<HealthIndicatorResult> {
    // 模拟检查逻辑
    const healthStatus = Math.random() > options.threshold;
    
    const result = this.getStatus(key, healthStatus, {
      message: healthStatus ? 'Service is healthy' : 'Service is unhealthy',
    });
    
    if (!healthStatus) {
      throw new HealthCheckError('Custom health check failed', result);
    }
    
    return result;
  }
}

然后在健康检查控制器中使用:

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService } from '@nestjs/terminus';
import { CustomHealthIndicator } from './custom.health';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private customHealthIndicator: CustomHealthIndicator,
  ) {}

  @Get()
  @HealthCheck()
async check() {
    return this.health.check([
      () => this.customHealthIndicator.isHealthy('custom-service', { threshold: 0.5 }),
    ]);
  }
}

6. 健康检查配置

6.1 基本配置

我们可以在导入健康检查模块时进行基本配置:

// src/app.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { AppController } from './app.controller';
import { AppService } from './app.service';

@Module({
  imports: [
    TerminusModule.forRoot({
      logger: console,
      errorLogStyle: 'pretty',
    }),
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

6.2 自定义响应格式

我们可以自定义健康检查的响应格式:

// src/health/health.controller.ts
import { Controller, Get, Res } from '@nestjs/common';
import { Response } from 'express';
import { HealthCheck, HealthCheckService, HttpHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private http: HttpHealthIndicator,
  ) {}

  @Get()
  @HealthCheck()
async check(@Res() res: Response) {
    try {
      const result = await this.health.check([
        () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
      ]);
      return res.json(result);
    } catch (error) {
      return res.status(503).json({
        status: 'error',
        error: error.message,
        timestamp: new Date().toISOString(),
      });
    }
  }
}

7. 集成到监控系统

7.1 集成到Prometheus

首先,我们需要安装Prometheus客户端:

npm install --save prom-client

然后,创建Prometheus指标服务:

// src/metrics/metrics.service.ts
import { Injectable } from '@nestjs/common';
import { register, Counter, Gauge, Histogram, Summary } from 'prom-client';

@Injectable()
export class MetricsService {
  private readonly httpRequestsTotal: Counter<string>;
  private readonly httpRequestDurationSeconds: Histogram<string>;
  private readonly appHealth: Gauge<string>;

  constructor() {
    // 重置所有指标
    register.clear();
    
    // 设置默认标签
    register.setDefaultLabels({
      app: 'nestjs-application',
    });
    
    // 创建指标
    this.httpRequestsTotal = new Counter({
      name: 'http_requests_total',
      help: 'Total number of HTTP requests',
      labelNames: ['method', 'route', 'status'],
    });
    
    this.httpRequestDurationSeconds = new Histogram({
      name: 'http_request_duration_seconds',
      help: 'HTTP request duration in seconds',
      labelNames: ['method', 'route', 'status'],
      buckets: [0.1, 0.5, 1, 2, 5],
    });
    
    this.appHealth = new Gauge({
      name: 'app_health',
      help: 'Application health status',
      labelNames: ['service'],
    });
  }

  // 记录HTTP请求
  recordHttpRequest(method: string, route: string, status: number, duration: number) {
    this.httpRequestsTotal.labels(method, route, status.toString()).inc();
    this.httpRequestDurationSeconds.labels(method, route, status.toString()).observe(duration);
  }

  // 设置应用健康状态
  setAppHealth(service: string, status: number) {
    this.appHealth.labels(service).set(status);
  }

  // 获取所有指标
  async getMetrics() {
    return register.metrics();
  }
}

创建指标控制器:

// src/metrics/metrics.controller.ts
import { Controller, Get, Res } from '@nestjs/common';
import { Response } from 'express';
import { MetricsService } from './metrics.service';

@Controller('metrics')
export class MetricsController {
  constructor(private readonly metricsService: MetricsService) {}

  @Get()
async getMetrics(@Res() res: Response) {
    const metrics = await this.metricsService.getMetrics();
    res.set('Content-Type', 'text/plain');
    res.send(metrics);
  }
}

创建指标模块:

// src/metrics/metrics.module.ts
import { Module } from '@nestjs/common';
import { MetricsController } from './metrics.controller';
import { MetricsService } from './metrics.service';

@Module({
  controllers: [MetricsController],
  providers: [MetricsService],
  exports: [MetricsService],
})
export class MetricsModule {}

在应用模块中导入指标模块:

// src/app.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { MetricsModule } from './metrics/metrics.module';
import { AppController } from './app.controller';
import { AppService } from './app.service';

@Module({
  imports: [
    TerminusModule,
    MetricsModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

7.2 集成到Grafana

我们可以使用Grafana来可视化Prometheus指标。首先,我们需要配置Prometheus数据源,然后创建Grafana仪表板。

8. 就绪检查和存活检查

在Kubernetes环境中,我们通常需要两种类型的健康检查:

  • 存活检查(Liveness Probe):用于检测应用程序是否还在运行,如果检查失败,Kubernetes会重启容器。
  • 就绪检查(Readiness Probe):用于检测应用程序是否准备好接收流量,如果检查失败,Kubernetes会从服务端点中移除容器。

我们可以在健康检查控制器中实现这两种检查:

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, HttpHealthIndicator, TypeOrmHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private http: HttpHealthIndicator,
    private db: TypeOrmHealthIndicator,
  ) {}

  // 存活检查
  @Get('liveness')
  @HealthCheck()
async liveness() {
    return this.health.check([]);
  }

  // 就绪检查
  @Get('readiness')
  @HealthCheck()
async readiness() {
    return this.health.check([
      () => this.db.pingCheck('database'),
      () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
    ]);
  }
}

然后在Kubernetes配置文件中使用:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nestjs-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nestjs-application
  template:
    metadata:
      labels:
        app: nestjs-application
    spec:
      containers:
      - name: nestjs-application
        image: nestjs-application:latest
        ports:
        - containerPort: 3000
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

实用案例分析

案例1:完整的健康检查系统

需求分析

我们需要实现一个完整的健康检查系统,包括:

  • 检查应用程序的基本状态
  • 检查数据库连接状态
  • 检查Redis连接状态
  • 检查外部API服务状态
  • 集成到监控系统
  • 支持Kubernetes的就绪检查和存活检查

实现方案

  1. 安装所需依赖
npm install --save @nestjs/terminus @nestjs/typeorm typeorm mysql2 ioredis prom-client
  1. 创建健康检查模块
// src/health/health.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { TypeOrmModule } from '@nestjs/typeorm';
import { HealthController } from './health.controller';
import { CustomHealthIndicator } from './custom.health';

@Module({
  imports: [
    TerminusModule,
    TypeOrmModule.forRoot({
      type: 'mysql',
      host: process.env.DB_HOST || 'localhost',
      port: parseInt(process.env.DB_PORT) || 3306,
      username: process.env.DB_USERNAME || 'root',
      password: process.env.DB_PASSWORD || 'password',
      database: process.env.DB_NAME || 'nestjs',
      autoLoadEntities: true,
      synchronize: true,
    }),
  ],
  controllers: [HealthController],
  providers: [CustomHealthIndicator],
})
export class HealthModule {}
  1. 创建健康检查控制器
// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, HttpHealthIndicator, TypeOrmHealthIndicator } from '@nestjs/terminus';
import { CustomHealthIndicator } from './custom.health';
import * as Redis from 'ioredis';

@Controller('health')
export class HealthController {
  private readonly redisClient: Redis.Redis;

  constructor(
    private health: HealthCheckService,
    private http: HttpHealthIndicator,
    private db: TypeOrmHealthIndicator,
    private customHealthIndicator: CustomHealthIndicator,
  ) {
    this.redisClient = new Redis({
      host: process.env.REDIS_HOST || 'localhost',
      port: parseInt(process.env.REDIS_PORT) || 6379,
    });
  }

  // 存活检查
  @Get('liveness')
  @HealthCheck()
async liveness() {
    return this.health.check([]);
  }

  // 就绪检查
  @Get('readiness')
  @HealthCheck()
async readiness() {
    return this.health.check([
      () => this.db.pingCheck('database'),
      () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
      async () => {
        try {
          await this.redisClient.ping();
          return {
            redis: {
              status: 'up',
            },
          };
        } catch (error) {
          return {
            redis: {
              status: 'down',
              error: error.message,
            },
          };
        }
      },
      () => this.customHealthIndicator.isHealthy('custom-service', { threshold: 0.5 }),
    ]);
  }

  // 完整健康检查
  @Get()
  @HealthCheck()
async check() {
    return this.health.check([
      () => this.db.pingCheck('database'),
      () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
      async () => {
        try {
          await this.redisClient.ping();
          return {
            redis: {
              status: 'up',
            },
          };
        } catch (error) {
          return {
            redis: {
              status: 'down',
              error: error.message,
            },
          };
        }
      },
      () => this.customHealthIndicator.isHealthy('custom-service', { threshold: 0.5 }),
    ]);
  }
}
  1. 创建自定义健康检查指标
// src/health/custom.health.ts
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';

@Injectable()
export class CustomHealthIndicator extends HealthIndicator {
  async isHealthy(key: string, options: { threshold: number }): Promise<HealthIndicatorResult> {
    // 模拟检查逻辑
    const healthStatus = Math.random() > options.threshold;
    
    const result = this.getStatus(key, healthStatus, {
      message: healthStatus ? 'Service is healthy' : 'Service is unhealthy',
      timestamp: new Date().toISOString(),
    });
    
    if (!healthStatus) {
      throw new HealthCheckError('Custom health check failed', result);
    }
    
    return result;
  }
}
  1. 创建指标模块
// src/metrics/metrics.module.ts
import { Module } from '@nestjs/common';
import { MetricsController } from './metrics.controller';
import { MetricsService } from './metrics.service';

@Module({
  controllers: [MetricsController],
  providers: [MetricsService],
  exports: [MetricsService],
})
export class MetricsModule {}
  1. 创建指标服务
// src/metrics/metrics.service.ts
import { Injectable } from '@nestjs/common';
import { register, Counter, Gauge, Histogram, Summary } from 'prom-client';

@Injectable()
export class MetricsService {
  private readonly httpRequestsTotal: Counter<string>;
  private readonly httpRequestDurationSeconds: Histogram<string>;
  private readonly appHealth: Gauge<string>;
  private readonly dbHealth: Gauge<string>;
  private readonly redisHealth: Gauge<string>;

  constructor() {
    // 重置所有指标
    register.clear();
    
    // 设置默认标签
    register.setDefaultLabels({
      app: 'nestjs-application',
    });
    
    // 创建指标
    this.httpRequestsTotal = new Counter({
      name: 'http_requests_total',
      help: 'Total number of HTTP requests',
      labelNames: ['method', 'route', 'status'],
    });
    
    this.httpRequestDurationSeconds = new Histogram({
      name: 'http_request_duration_seconds',
      help: 'HTTP request duration in seconds',
      labelNames: ['method', 'route', 'status'],
      buckets: [0.1, 0.5, 1, 2, 5],
    });
    
    this.appHealth = new Gauge({
      name: 'app_health',
      help: 'Application health status',
      labelNames: ['service'],
    });
    
    this.dbHealth = new Gauge({
      name: 'db_health',
      help: 'Database health status',
      labelNames: ['database'],
    });
    
    this.redisHealth = new Gauge({
      name: 'redis_health',
      help: 'Redis health status',
      labelNames: ['service'],
    });
  }

  // 记录HTTP请求
  recordHttpRequest(method: string, route: string, status: number, duration: number) {
    this.httpRequestsTotal.labels(method, route, status.toString()).inc();
    this.httpRequestDurationSeconds.labels(method, route, status.toString()).observe(duration);
  }

  // 设置应用健康状态
  setAppHealth(service: string, status: number) {
    this.appHealth.labels(service).set(status);
  }

  // 设置数据库健康状态
  setDbHealth(database: string, status: number) {
    this.dbHealth.labels(database).set(status);
  }

  // 设置Redis健康状态
  setRedisHealth(service: string, status: number) {
    this.redisHealth.labels(service).set(status);
  }

  // 获取所有指标
  async getMetrics() {
    return register.metrics();
  }
}
  1. 创建指标控制器
// src/metrics/metrics.controller.ts
import { Controller, Get, Res } from '@nestjs/common';
import { Response } from 'express';
import { MetricsService } from './metrics.service';

@Controller('metrics')
export class MetricsController {
  constructor(private readonly metricsService: MetricsService) {}

  @Get()
async getMetrics(@Res() res: Response) {
    const metrics = await this.metricsService.getMetrics();
    res.set('Content-Type', 'text/plain');
    res.send(metrics);
  }
}
  1. 在应用模块中导入
// src/app.module.ts
import { Module } from '@nestjs/common';
import { HealthModule } from './health/health.module';
import { MetricsModule } from './metrics/metrics.module';
import { AppController } from './app.controller';
import { AppService } from './app.service';

@Module({
  imports: [
    HealthModule,
    MetricsModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}
  1. 创建HTTP拦截器记录指标
// src/common/interceptors/metrics.interceptor.ts
import { Injectable, NestInterceptor, ExecutionContext, CallHandler } from '@nestjs/common';
import { Observable } from 'rxjs';
import { tap } from 'rxjs/operators';
import { MetricsService } from '../../metrics/metrics.service';

@Injectable()
export class MetricsInterceptor implements NestInterceptor {
  constructor(private readonly metricsService: MetricsService) {}

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const now = Date.now();
    const request = context.switchToHttp().getRequest();
    const response = context.switchToHttp().getResponse();
    
    const method = request.method;
    const route = request.route ? request.route.path : request.url;
    
    return next.handle().pipe(
      tap(() => {
        const duration = (Date.now() - now) / 1000;
        const status = response.statusCode;
        
        this.metricsService.recordHttpRequest(method, route, status, duration);
      }),
    );
  }
}
  1. 在主文件中使用拦截器
// src/main.ts
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
import { MetricsInterceptor } from './common/interceptors/metrics.interceptor';
import { MetricsService } from './metrics/metrics.service';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  
  // 获取指标服务
  const metricsService = app.get(MetricsService);
  // 使用指标拦截器
  app.useGlobalInterceptors(new MetricsInterceptor(metricsService));
  
  await app.listen(3000);
}
bootstrap();

常见问题与解决方案

1. 健康检查失败

可能原因

  • 依赖服务不可用(如数据库、Redis等)
  • 健康检查配置错误
  • 网络问题

解决方案

  • 检查依赖服务的状态
  • 检查健康检查配置是否正确
  • 检查网络连接是否正常

2. 健康检查响应慢

可能原因

  • 依赖服务响应慢
  • 健康检查逻辑复杂
  • 并发请求过多

解决方案

  • 优化依赖服务性能
  • 简化健康检查逻辑
  • 增加健康检查的超时时间

3. 监控系统无法获取指标

可能原因

  • 指标端点配置错误
  • Prometheus配置错误
  • 网络访问权限问题

解决方案

  • 检查指标端点是否可访问
  • 检查Prometheus配置是否正确
  • 检查网络访问权限设置

4. Kubernetes就绪检查失败

可能原因

  • 依赖服务未就绪
  • 应用程序初始化时间过长
  • 健康检查配置错误

解决方案

  • 确保依赖服务已就绪
  • 增加就绪检查的初始延迟时间
  • 检查健康检查配置是否正确

最佳实践

  1. 分层健康检查:实现不同级别的健康检查,如基本检查、详细检查等
  2. 依赖服务检查:检查所有关键依赖服务的状态
  3. 合理的超时设置:为健康检查设置合理的超时时间
  4. 监控集成:将健康检查集成到监控系统中
  5. Kubernetes适配:实现符合Kubernetes要求的就绪检查和存活检查
  6. 错误处理:为健康检查添加适当的错误处理
  7. 日志记录:为健康检查添加详细的日志记录
  8. 性能优化:确保健康检查不会影响应用程序的性能

代码优化建议

  1. 使用配置管理:将健康检查的配置放到配置文件中
  2. 实现缓存:对健康检查结果进行缓存,减少重复检查
  3. 使用异步检查:使用异步健康检查,提高并发性能
  4. 添加指标标签:为健康指标添加更多标签,提高可观测性
  5. 实现告警:当健康检查失败时,触发告警通知

总结

NestJS的健康检查模块提供了一种简洁、高效的方式来监控应用程序的状态。通过本文的学习,你应该已经掌握了:

  • 如何安装和配置健康检查模块
  • 如何使用内置的健康检查指标
  • 如何创建自定义的健康检查指标
  • 如何集成健康检查到监控系统
  • 如何实现Kubernetes的就绪检查和存活检查
  • 健康检查的最佳实践和常见问题解决方案

健康检查是现代应用程序的重要组成部分,它可以帮助我们及时发现和解决应用程序的问题,提高应用程序的可靠性和可用性。合理使用NestJS的健康检查功能,可以让你的应用程序更加健壮和可维护。

互动问答

  1. 以下哪个是NestJS健康检查模块的正确安装命令?
    A. npm install --save @nestjs/health
    B. npm install --save @nestjs/terminus
    C. npm install --save health-check
    D. npm install --save terminus

  2. 如何在NestJS中实现数据库健康检查?

  3. 如何创建自定义健康检查指标?

  4. 什么是Kubernetes的就绪检查和存活检查?它们有什么区别?

  5. 如何将健康检查集成到Prometheus监控系统?

« 上一篇 NestJS HTTP客户端 下一篇 » NestJS API文档