1 概述

1.1 什么是harbor

Harbor 是一个企业级的云原生容器镜像仓库，由 VMware 主导开发并贡献给 Cloud Native Computing Foundation (CNCF)。它通过为 Docker 镜像提供安全、高效的管理能力，帮助企业简化容器应用程序的交付流程。相比于传统的 Docker Registry，Harbor 提供了更多的企业级特性，如容器镜像仓库之间的镜像复制、用户管理、访问控制、漏洞扫描和镜像签名等功能

1.2 Harbor的特性

Harbor 的主要作用是为容器化应用程序提供集中式的镜像存储管理。它允许企业通过集中的仓库存储、分发和保护容器镜像，确保开发、测试和生产环境中使用的镜像都符合安全和合规要求。

Harbor有以下特性：

基于云原生场景： Harbor 支持容器镜像和 Helm Chart，可用作容器Runtime和编排平台等云原生环境的镜像仓库。
镜像管理: Harbor 作为一个企业级的镜像仓库，支持 Docker 和 OCI 格式镜像的存储和管理。
细粒度的访问控制: 通过基于角色的访问控制 (RBAC)，Harbor 能够确保不同用户在仓库中的操作权限得到精确控制。
镜像复制: 支持跨多个 Harbor 实例进行镜像复制，帮助实现多数据中心或混合云环境下的高效镜像分发。
漏洞扫描: 集成了 Clair 或 Trivy 等安全工具，Harbor 可以自动扫描镜像中的安全漏洞，确保部署的镜像安全。
镜像签名和内容信任: 通过 Notary 集成，Harbor 支持镜像的签名和验证，确保镜像的完整性和可信度。
日志与审计: 提供详细的操作日志和审计功能，帮助企业了解镜像的使用和管理情况。
多租户支持: Harbor 支持项目隔离，帮助企业实现多租户环境下的镜像管理。
LDAP/AD 支持：Harbor 与现有的企业 LDAP/AD 集成以进行用户身份验证和管理，并支持将 LDAP 组导入 Harbor，然后可以授予特定项目的权限。
镜像删除和垃圾收集：系统管理员可以运行垃圾回收作业，以便可以删除镜像（悬挂的manifests 和未引用的blobs），并且可以定期释放这些空间。
审核：对存储库的所有操作都通过日志进行跟踪。
RESTful API：提供 RESTful API 以方便管理操作，并且易于使用以与外部系统集成。嵌入式 Swagger UI 可用于探索和测试 API。

1.3 Harbor架构

Harbor 的架构设计遵循微服务原则，由多个松耦合的组件组成，每个组件负责不同的功能模块。在V2.0版本，已经完全符合 OCI 标准。

下图是 Harbor 的整体架构

2 Harbor部署

2.1 环境配置：

系统：kylin V10
docker： 26.1.4
docker compose：v2.27.1
Harbor：2.12.3

2.2 部署安装

Harbor官网安装文档

Github仓库地址

安装前准备工作：

Make sure that your target host meets the Harbor Installation Prerequisites.
Download the Harbor Installer
Configure HTTPS Access to Harbor
Configure the Harbor YML File
Configure Enabling Internal TLS
Run the Installer Script

If installation fails, see Troubleshooting Harbor Installation.

下载Harbor离线包，我们选择1.12.3版本

详细安装方法

[root@kylin01 opt]# wget https://github.com/goharbor/harbor/releases/download/v2.12.3/harbor-offline-installer-v2.12.3.tgz
[root@kylin01 opt]# tar zxvf harbor-offline-installer-v2.12.3.tgz
harbor/harbor.v2.12.3.tar.gz
harbor/prepare
harbor/LICENSE
harbor/install.sh
harbor/common.sh
harbor/harbor.yml.tmpl

修改harbor.yml配置文件

[root@kylin01 harbor]# cp harbor.yml.tmpl harbor.yml

# Configuration file of Harbor

# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
# 修改主机ip
hostname: 10.168.1.162

# http related config
http:
  # port for http, default is 80. If https enabled, this port will redirect to https port
  port: 80

# https related config
# 如果没有配置SSL，注释HTTPS
#https:
  # https port for harbor, default is 443
#  port: 443
  # The path of cert and key files for nginx
#  certificate: /your/certificate/path
#  private_key: /your/private/key/path
  # enable strong ssl ciphers (default: false)
  # strong_ssl_ciphers: false

# # Harbor will set ipv4 enabled only by default if this block is not configured
# # Otherwise, please uncomment this block to configure your own ip_family stacks
# ip_family:
#   # ipv6Enabled set to true if ipv6 is enabled in docker network, currently it affected the nginx related component
#   ipv6:
#     enabled: false
#   # ipv4Enabled set to true by default, currently it affected the nginx related component
#   ipv4:
#     enabled: true

# # Uncomment following will enable tls communication between all harbor components
# internal_tls:
#   # set enabled to true means internal tls is enabled
#   enabled: true
#   # put your cert and key files on dir
#   dir: /etc/harbor/tls/internal


# Uncomment external_url if you want to enable external proxy
# And when it enabled the hostname will no longer used
# external_url: https://reg.mydomain.com:8433

# The initial password of Harbor admin
# It only works in first time to install harbor
# Remember Change the admin password from UI after launching Harbor.
harbor_admin_password: Harbor12345

# Harbor DB configuration
database:
  # The password for the user('postgres' by default) of Harbor DB. Change this before any production use.
  password: root123
  # The maximum number of connections in the idle connection pool. If it <=0, no idle connections are retained.
  max_idle_conns: 100
  # The maximum number of open connections to the database. If it <= 0, then there is no limit on the number of open connections.
  # Note: the default number of connections is 1024 for postgres of harbor.
  max_open_conns: 900
  # The maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's age.
  # The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
  conn_max_lifetime: 5m
  # The maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's idle time.
  # The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
  conn_max_idle_time: 0

# The default data volume
# 依据实际情况修改存储目录
data_volume: /data

# Harbor Storage settings by default is using /data dir on local filesystem
# Uncomment storage_service setting If you want to using external storage
# storage_service:
#   # ca_bundle is the path to the custom root ca certificate, which will be injected into the truststore
#   # of registry's containers.  This is usually needed when the user hosts a internal storage with self signed certificate.
#   ca_bundle:

#   # storage backend, default is filesystem, options include filesystem, azure, gcs, s3, swift and oss
#   # for more info about this configuration please refer https://distribution.github.io/distribution/about/configuration/
#   # and https://distribution.github.io/distribution/storage-drivers/
#   filesystem:
#     maxthreads: 100
#   # set disable to true when you want to disable registry redirect
#   redirect:
#     disable: false

# Trivy configuration
#
# Trivy DB contains vulnerability information from NVD, Red Hat, and many other upstream vulnerability databases.
# It is downloaded by Trivy from the GitHub release page https://github.com/aquasecurity/trivy-db/releases and cached
# in the local file system. In addition, the database contains the update timestamp so Trivy can detect whether it
# should download a newer version from the Internet or use the cached one. Currently, the database is updated every
# 12 hours and published as a new release to GitHub.
trivy:
  # ignoreUnfixed The flag to display only fixed vulnerabilities
  ignore_unfixed: false
  # skipUpdate The flag to enable or disable Trivy DB downloads from GitHub
  #
  # You might want to enable this flag in test or CI/CD environments to avoid GitHub rate limiting issues.
  # If the flag is enabled you have to download the `trivy-offline.tar.gz` archive manually, extract `trivy.db` and
  # `metadata.json` files and mount them in the `/home/scanner/.cache/trivy/db` path.
  skip_update: false
  #
  # skipJavaDBUpdate If the flag is enabled you have to manually download the `trivy-java.db` file and mount it in the
  # `/home/scanner/.cache/trivy/java-db/trivy-java.db` path
  skip_java_db_update: false
  #
  # The offline_scan option prevents Trivy from sending API requests to identify dependencies.
  # Scanning JAR files and pom.xml may require Internet access for better detection, but this option tries to avoid it.
  # For example, the offline mode will not try to resolve transitive dependencies in pom.xml when the dependency doesn't
  # exist in the local repositories. It means a number of detected vulnerabilities might be fewer in offline mode.
  # It would work if all the dependencies are in local.
  # This option doesn't affect DB download. You need to specify "skip-update" as well as "offline-scan" in an air-gapped environment.
  offline_scan: false
  #
  # Comma-separated list of what security issues to detect. Possible values are `vuln`, `config` and `secret`. Defaults to `vuln`.
  security_check: vuln
  #
  # insecure The flag to skip verifying registry certificate
  insecure: false
  #
  # timeout The duration to wait for scan completion.
  # There is upper bound of 30 minutes defined in scan job. So if this `timeout` is larger than 30m0s, it will also timeout at 30m0s.
  timeout: 5m0s
  #
  # github_token The GitHub access token to download Trivy DB
  #
  # Anonymous downloads from GitHub are subject to the limit of 60 requests per hour. Normally such rate limit is enough
  # for production operations. If, for any reason, it's not enough, you could increase the rate limit to 5000
  # requests per hour by specifying the GitHub access token. For more details on GitHub rate limiting please consult
  # https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting
  #
  # You can create a GitHub token by following the instructions in
  # https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line
  #
  # github_token: xxx

jobservice:
  # Maximum number of job workers in job service
  max_job_workers: 10
  # The jobLoggers backend name, only support "STD_OUTPUT", "FILE" and/or "DB"
  job_loggers:
    - STD_OUTPUT
    - FILE
    # - DB
  # The jobLogger sweeper duration (ignored if `jobLogger` is `stdout`)
  logger_sweeper_duration: 1 #days

notification:
  # Maximum retry count for webhook job
  webhook_job_max_retry: 3
  # HTTP client timeout for webhook job
  webhook_job_http_client_timeout: 3 #seconds

# Log configurations
log:
  # options are debug, info, warning, error, fatal
  level: info
  # configs for logs in local storage
  local:
    # Log files are rotated log_rotate_count times before being removed. If count is 0, old versions are removed rather than rotated.
    rotate_count: 50
    # Log files are rotated only if they grow bigger than log_rotate_size bytes. If size is followed by k, the size is assumed to be in kilobytes.
    # If the M is used, the size is in megabytes, and if G is used, the size is in gigabytes. So size 100, size 100k, size 100M and size 100G
    # are all valid.
    rotate_size: 200M
    # The directory on your host that store log
    location: /var/log/harbor

  # Uncomment following lines to enable external syslog endpoint.
  # external_endpoint:
  #   # protocol used to transmit log to external endpoint, options is tcp or udp
  #   protocol: tcp
  #   # The host of external endpoint
  #   host: localhost
  #   # Port of external endpoint
  #   port: 5140

#This attribute is for migrator to detect the version of the .cfg file, DO NOT MODIFY!
_version: 2.12.0

# Uncomment external_database if using external database.
# external_database:
#   harbor:
#     host: harbor_db_host
#     port: harbor_db_port
#     db_name: harbor_db_name
#     username: harbor_db_username
#     password: harbor_db_password
#     ssl_mode: disable
#     max_idle_conns: 2
#     max_open_conns: 0

# Uncomment redis if need to customize redis db
# redis:
#   # db_index 0 is for core, it's unchangeable
#   # registry_db_index: 1
#   # jobservice_db_index: 2
#   # trivy_db_index: 5
#   # it's optional, the db for harbor business misc, by default is 0, uncomment it if you want to change it.
#   # harbor_db_index: 6
#   # it's optional, the db for harbor cache layer, by default is 0, uncomment it if you want to change it.
#   # cache_layer_db_index: 7

# Uncomment external_redis if using external Redis server
# external_redis:
#   # support redis, redis+sentinel
#   # host for redis: <host_redis>:<port_redis>
#   # host for redis+sentinel:
#   #  <host_sentinel1>:<port_sentinel1>,<host_sentinel2>:<port_sentinel2>,<host_sentinel3>:<port_sentinel3>
#   host: redis:6379
#   password: 
#   # Redis AUTH command was extended in Redis 6, it is possible to use it in the two-arguments AUTH <username> <password> form.
#   # there's a known issue when using external redis username ref:https://github.com/goharbor/harbor/issues/18892
#   # if you care about the image pull/push performance, please refer to this https://github.com/goharbor/harbor/wiki/Harbor-FAQs#external-redis-username-password-usage
#   # username:
#   # sentinel_master_set must be set to support redis+sentinel
#   #sentinel_master_set:
#   # db_index 0 is for core, it's unchangeable
#   registry_db_index: 1
#   jobservice_db_index: 2
#   trivy_db_index: 5
#   idle_timeout_seconds: 30
#   # it's optional, the db for harbor business misc, by default is 0, uncomment it if you want to change it.
#   # harbor_db_index: 6
#   # it's optional, the db for harbor cache layer, by default is 0, uncomment it if you want to change it.
#   # cache_layer_db_index: 7

# Uncomment uaa for trusting the certificate of uaa instance that is hosted via self-signed cert.
# uaa:
#   ca_file: /path/to/ca

# Global proxy
# Config http proxy for components, e.g. http://my.proxy.com:3128
# Components doesn't need to connect to each others via http proxy.
# Remove component from `components` array if want disable proxy
# for it. If you want use proxy for replication, MUST enable proxy
# for core and jobservice, and set `http_proxy` and `https_proxy`.
# Add domain to the `no_proxy` field, when you want disable proxy
# for some special registry.
proxy:
  http_proxy:
  https_proxy:
  no_proxy:
  components:
    - core
    - jobservice
    - trivy

# metric:
#   enabled: false
#   port: 9090
#   path: /metrics

# Trace related config
# only can enable one trace provider(jaeger or otel) at the same time,
# and when using jaeger as provider, can only enable it with agent mode or collector mode.
# if using jaeger collector mode, uncomment endpoint and uncomment username, password if needed
# if using jaeger agetn mode uncomment agent_host and agent_port
# trace:
#   enabled: true
#   # set sample_rate to 1 if you wanna sampling 100% of trace data; set 0.5 if you wanna sampling 50% of trace data, and so forth
#   sample_rate: 1
#   # # namespace used to differentiate different harbor services
#   # namespace:
#   # # attributes is a key value dict contains user defined attributes used to initialize trace provider
#   # attributes:
#   #   application: harbor
#   # # jaeger should be 1.26 or newer.
#   # jaeger:
#   #   endpoint: http://hostname:14268/api/traces
#   #   username:
#   #   password:
#   #   agent_host: hostname
#   #   # export trace data by jaeger.thrift in compact mode
#   #   agent_port: 6831
#   # otel:
#   #   endpoint: hostname:4318
#   #   url_path: /v1/traces
#   #   compression: false
#   #   insecure: true
#   #   # timeout is in seconds
#   #   timeout: 10

# Enable purge _upload directories
upload_purging:
  enabled: true
  # remove files in _upload directories which exist for a period of time, default is one week.
  age: 168h
  # the interval of the purge operations
  interval: 24h
  dryrun: false

# Cache layer configurations
# If this feature enabled, harbor will cache the resource
# `project/project_metadata/repository/artifact/manifest` in the redis
# which can especially help to improve the performance of high concurrent
# manifest pulling.
# NOTICE
# If you are deploying Harbor in HA mode, make sure that all the harbor
# instances have the same behaviour, all with caching enabled or disabled,
# otherwise it can lead to potential data inconsistency.
cache:
  # not enabled by default
  enabled: false
  # keep cache for one day by default
  expire_hours: 24

# Harbor core configurations
# Uncomment to enable the following harbor core related configuration items.
# core:
#   # The provider for updating project quota(usage), there are 2 options, redis or db,
#   # by default is implemented by db but you can switch the updation via redis which
#   # can improve the performance of high concurrent pushing to the same project,
#   # and reduce the database connections spike and occupies.
#   # By redis will bring up some delay for quota usage updation for display, so only
#   # suggest switch provider to redis if you were ran into the db connections spike around
#   # the scenario of high concurrent pushing to same project, no improvement for other scenes.
#   quota_update_provider: redis # Or db

预配置和部署

[root@kylin01 harbor]# ./prepare && ./install.sh
.......
[Step 5]: starting Harbor ...
[+] Running 10/10
 ✔ Network harbor_harbor        Created                                                                                                                                  0.1s 
 ✔ Container harbor-log         Started                                                                                                                                  0.3s 
 ✔ Container harbor-portal      Started                                                                                                                                  0.6s 
 ✔ Container registry           Started                                                                                                                                  0.6s 
 ✔ Container harbor-db          Started                                                                                                                                  0.6s 
 ✔ Container redis              Started                                                                                                                                  0.6s 
 ✔ Container registryctl        Started                                                                                                                                  0.6s 
 ✔ Container harbor-core        Started                                                                                                                                  0.8s 
 ✔ Container harbor-jobservice  Started                                                                                                                                  1.0s 
 ✔ Container nginx              Started                                                                                                                                  1.0s 
✔ ----Harbor has been installed and started successfully.----

查看harbor状态

[root@kylin01 harbor]# docker compose ps
NAME                IMAGE                                 COMMAND                   SERVICE       CREATED              STATUS                        PORTS
harbor-core         goharbor/harbor-core:v2.12.3          "/harbor/entrypoint.…"   core          About a minute ago   Up About a minute (healthy)   
harbor-db           goharbor/harbor-db:v2.12.3            "/docker-entrypoint.…"   postgresql    About a minute ago   Up About a minute (healthy)   
harbor-jobservice   goharbor/harbor-jobservice:v2.12.3    "/harbor/entrypoint.…"   jobservice    About a minute ago   Up About a minute (healthy)   
harbor-log          goharbor/harbor-log:v2.12.3           "/bin/sh -c /usr/loc…"   log           About a minute ago   Up About a minute (healthy)   127.0.0.1:1514->10514/tcp
harbor-portal       goharbor/harbor-portal:v2.12.3        "nginx -g 'daemon of…"   portal        About a minute ago   Up About a minute (healthy)   
nginx               goharbor/nginx-photon:v2.12.3         "nginx -g 'daemon of…"   proxy         About a minute ago   Up About a minute (healthy)   0.0.0.0:80->8080/tcp, :::80->8080/tcp
redis               goharbor/redis-photon:v2.12.3         "redis-server /etc/r…"   redis         About a minute ago   Up About a minute (healthy)   
registry            goharbor/registry-photon:v2.12.3      "/home/harbor/entryp…"   registry      About a minute ago   Up About a minute (healthy)   
registryctl         goharbor/harbor-registryctl:v2.12.3   "/home/harbor/start.…"   registryctl   About a minute ago   Up About a minute (healthy)

2.3 helm安装高可用harbor

2.3.1 环境配置

Kubernetes cluster 1.10+
Helm 2.8.0+
Highly available ingress controller
Highly available PostgreSQL 9.6+
Highly available Redis
PVC that can be shared across nodes or external object storage

2.3.2 架构

Harbor 的大部分组件现在都是无状态的，所以我们可以简单地增加 Pod 的副本数，确保组件分布到多个 Worker 节点上，并利用 K8S 的 Service 机制来保证 Pod 之间的连通性。

至于存储层，期望用户为应用程序数据提供高可用的 PostgreSQL 和 Redis 集群，以及用于存储图像和图表的 PVC 或对象存储。

2.3.3 helm chart 安装

helm repo add harbor https://helm.goharbor.io
helm fetch harbor/harbor --untar

2.3.4 配置values.yaml

配置 ingress urlexpose.ingress.hosts.core
配置 external url
配置 external postgresql
- 将 database.type 设置为 external ，并在 database.external 部分填写信息。
- 需要创建一个空数据库，默认情况下数据库设置为 registry ，但可以通过设置 coreDatabase 来更改。
配置 external redis
- 将 redis.type 设置为 external ，并在 redis.external 部分填写信息
- 暂不支持TLS或Redis Cluster
配置 storage
- 建议使用支持以 ReadWriteMany 方式跨节点共享的 StorageClass 来配置用于存储图像、图表和作业日志的卷，这样可以按需扩展组件。如果此类卷类型不是您的默认 StorageClass，则需要在以下位置进行设置：
  - persistence.persistentVolumeClaim.registry.storageClass
  - persistence.persistentVolumeClaim.chartmuseum.storageClass
  - persistence.persistentVolumeClaim.jobservice.storageClass.
- 如果使用这样的 StorageClass ，则需要将以下字段的相关 accessMode 设置为 ReadWriteMany ：
  - persistence.persistentVolumeClaim.registry.accessMode
  - persistence.persistentVolumeClaim.chartmuseum.accessMode
  - persistence.persistentVolumeClaim.jobservice.accessMode
- 或者，通过设置以下方式使用现有的 PVC 来存储数据：
  - persistence.persistentVolumeClaim.registry.existingClaim
  - persistence.persistentVolumeClaim.chartmuseum.existingClaim
  - persistence.persistentVolumeClaim.jobservice.existingClaim
- 最后，如果您没有支持 ReadWriteMany StorageClass 或者不希望使用，可以使用外部对象存储来存储图像和图表，并将作业日志存储在数据库中。要启用外部对象存储，请将 persistence.imageChartStorage.type 设置为您想要的值，并填写相应的部分，并将 jobservice.jobLogger 设置为 database
配置replicas
- Set portal.replicas, core.replicas, jobservice.replicas, registry.replicas, chartmuseum.replicas, to n(n>=2).