1 概述
1.1 什么是harbor
Harbor 是一个企业级的云原生容器镜像仓库,由 VMware 主导开发并贡献给 Cloud Native Computing Foundation (CNCF)。它通过为 Docker 镜像提供安全、高效的管理能力,帮助企业简化容器应用程序的交付流程。相比于传统的 Docker Registry,Harbor 提供了更多的企业级特性,如容器镜像仓库之间的镜像复制、用户管理、访问控制、漏洞扫描和镜像签名等功能
1.2 Harbor的特性
Harbor 的主要作用是为容器化应用程序提供集中式的镜像存储管理。它允许企业通过集中的仓库存储、分发和保护容器镜像,确保开发、测试和生产环境中使用的镜像都符合安全和合规要求。
Harbor有以下特性:
基于云原生场景: Harbor 支持容器镜像和 Helm Chart,可用作容器Runtime和编排平台等云原生环境的镜像仓库。
镜像管理: Harbor 作为一个企业级的镜像仓库,支持 Docker 和 OCI 格式镜像的存储和管理。
细粒度的访问控制: 通过基于角色的访问控制 (RBAC),Harbor 能够确保不同用户在仓库中的操作权限得到精确控制。
镜像复制: 支持跨多个 Harbor 实例进行镜像复制,帮助实现多数据中心或混合云环境下的高效镜像分发。
漏洞扫描: 集成了 Clair 或 Trivy 等安全工具,Harbor 可以自动扫描镜像中的安全漏洞,确保部署的镜像安全。
镜像签名和内容信任: 通过 Notary 集成,Harbor 支持镜像的签名和验证,确保镜像的完整性和可信度。
日志与审计: 提供详细的操作日志和审计功能,帮助企业了解镜像的使用和管理情况。
多租户支持: Harbor 支持项目隔离,帮助企业实现多租户环境下的镜像管理。
LDAP/AD 支持:Harbor 与现有的企业 LDAP/AD 集成以进行用户身份验证和管理,并支持将 LDAP 组导入 Harbor,然后可以授予特定项目的权限。
镜像删除和垃圾收集:系统管理员可以运行垃圾回收作业,以便可以删除镜像(悬挂的manifests 和未引用的blobs),并且可以定期释放这些空间。
审核:对存储库的所有操作都通过日志进行跟踪。
RESTful API:提供 RESTful API 以方便管理操作,并且易于使用以与外部系统集成。嵌入式 Swagger UI 可用于探索和测试 API。
1.3 Harbor架构
Harbor 的架构设计遵循微服务原则,由多个松耦合的组件组成,每个组件负责不同的功能模块。在V2.0版本,已经完全符合 OCI 标准。
下图是 Harbor 的整体架构
2 Harbor部署
2.1 环境配置:
系统:kylin V10
docker: 26.1.4
docker compose:v2.27.1
Harbor:2.12.3
2.2 部署安装
Harbor官网安装文档
Github仓库地址
安装前准备工作:
Make sure that your target host meets the Harbor Installation Prerequisites.
If installation fails, see Troubleshooting Harbor Installation.
下载Harbor离线包,我们选择1.12.3版本
详细安装方法
[root@kylin01 opt]# wget https://github.com/goharbor/harbor/releases/download/v2.12.3/harbor-offline-installer-v2.12.3.tgz
[root@kylin01 opt]# tar zxvf harbor-offline-installer-v2.12.3.tgz
harbor/harbor.v2.12.3.tar.gz
harbor/prepare
harbor/LICENSE
harbor/install.sh
harbor/common.sh
harbor/harbor.yml.tmpl
修改harbor.yml配置文件
[root@kylin01 harbor]# cp harbor.yml.tmpl harbor.yml
# Configuration file of Harbor
# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
# 修改主机ip
hostname: 10.168.1.162
# http related config
http:
# port for http, default is 80. If https enabled, this port will redirect to https port
port: 80
# https related config
# 如果没有配置SSL,注释HTTPS
#https:
# https port for harbor, default is 443
# port: 443
# The path of cert and key files for nginx
# certificate: /your/certificate/path
# private_key: /your/private/key/path
# enable strong ssl ciphers (default: false)
# strong_ssl_ciphers: false
# # Harbor will set ipv4 enabled only by default if this block is not configured
# # Otherwise, please uncomment this block to configure your own ip_family stacks
# ip_family:
# # ipv6Enabled set to true if ipv6 is enabled in docker network, currently it affected the nginx related component
# ipv6:
# enabled: false
# # ipv4Enabled set to true by default, currently it affected the nginx related component
# ipv4:
# enabled: true
# # Uncomment following will enable tls communication between all harbor components
# internal_tls:
# # set enabled to true means internal tls is enabled
# enabled: true
# # put your cert and key files on dir
# dir: /etc/harbor/tls/internal
# Uncomment external_url if you want to enable external proxy
# And when it enabled the hostname will no longer used
# external_url: https://reg.mydomain.com:8433
# The initial password of Harbor admin
# It only works in first time to install harbor
# Remember Change the admin password from UI after launching Harbor.
harbor_admin_password: Harbor12345
# Harbor DB configuration
database:
# The password for the user('postgres' by default) of Harbor DB. Change this before any production use.
password: root123
# The maximum number of connections in the idle connection pool. If it <=0, no idle connections are retained.
max_idle_conns: 100
# The maximum number of open connections to the database. If it <= 0, then there is no limit on the number of open connections.
# Note: the default number of connections is 1024 for postgres of harbor.
max_open_conns: 900
# The maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's age.
# The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
conn_max_lifetime: 5m
# The maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's idle time.
# The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
conn_max_idle_time: 0
# The default data volume
# 依据实际情况修改存储目录
data_volume: /data
# Harbor Storage settings by default is using /data dir on local filesystem
# Uncomment storage_service setting If you want to using external storage
# storage_service:
# # ca_bundle is the path to the custom root ca certificate, which will be injected into the truststore
# # of registry's containers. This is usually needed when the user hosts a internal storage with self signed certificate.
# ca_bundle:
# # storage backend, default is filesystem, options include filesystem, azure, gcs, s3, swift and oss
# # for more info about this configuration please refer https://distribution.github.io/distribution/about/configuration/
# # and https://distribution.github.io/distribution/storage-drivers/
# filesystem:
# maxthreads: 100
# # set disable to true when you want to disable registry redirect
# redirect:
# disable: false
# Trivy configuration
#
# Trivy DB contains vulnerability information from NVD, Red Hat, and many other upstream vulnerability databases.
# It is downloaded by Trivy from the GitHub release page https://github.com/aquasecurity/trivy-db/releases and cached
# in the local file system. In addition, the database contains the update timestamp so Trivy can detect whether it
# should download a newer version from the Internet or use the cached one. Currently, the database is updated every
# 12 hours and published as a new release to GitHub.
trivy:
# ignoreUnfixed The flag to display only fixed vulnerabilities
ignore_unfixed: false
# skipUpdate The flag to enable or disable Trivy DB downloads from GitHub
#
# You might want to enable this flag in test or CI/CD environments to avoid GitHub rate limiting issues.
# If the flag is enabled you have to download the `trivy-offline.tar.gz` archive manually, extract `trivy.db` and
# `metadata.json` files and mount them in the `/home/scanner/.cache/trivy/db` path.
skip_update: false
#
# skipJavaDBUpdate If the flag is enabled you have to manually download the `trivy-java.db` file and mount it in the
# `/home/scanner/.cache/trivy/java-db/trivy-java.db` path
skip_java_db_update: false
#
# The offline_scan option prevents Trivy from sending API requests to identify dependencies.
# Scanning JAR files and pom.xml may require Internet access for better detection, but this option tries to avoid it.
# For example, the offline mode will not try to resolve transitive dependencies in pom.xml when the dependency doesn't
# exist in the local repositories. It means a number of detected vulnerabilities might be fewer in offline mode.
# It would work if all the dependencies are in local.
# This option doesn't affect DB download. You need to specify "skip-update" as well as "offline-scan" in an air-gapped environment.
offline_scan: false
#
# Comma-separated list of what security issues to detect. Possible values are `vuln`, `config` and `secret`. Defaults to `vuln`.
security_check: vuln
#
# insecure The flag to skip verifying registry certificate
insecure: false
#
# timeout The duration to wait for scan completion.
# There is upper bound of 30 minutes defined in scan job. So if this `timeout` is larger than 30m0s, it will also timeout at 30m0s.
timeout: 5m0s
#
# github_token The GitHub access token to download Trivy DB
#
# Anonymous downloads from GitHub are subject to the limit of 60 requests per hour. Normally such rate limit is enough
# for production operations. If, for any reason, it's not enough, you could increase the rate limit to 5000
# requests per hour by specifying the GitHub access token. For more details on GitHub rate limiting please consult
# https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting
#
# You can create a GitHub token by following the instructions in
# https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line
#
# github_token: xxx
jobservice:
# Maximum number of job workers in job service
max_job_workers: 10
# The jobLoggers backend name, only support "STD_OUTPUT", "FILE" and/or "DB"
job_loggers:
- STD_OUTPUT
- FILE
# - DB
# The jobLogger sweeper duration (ignored if `jobLogger` is `stdout`)
logger_sweeper_duration: 1 #days
notification:
# Maximum retry count for webhook job
webhook_job_max_retry: 3
# HTTP client timeout for webhook job
webhook_job_http_client_timeout: 3 #seconds
# Log configurations
log:
# options are debug, info, warning, error, fatal
level: info
# configs for logs in local storage
local:
# Log files are rotated log_rotate_count times before being removed. If count is 0, old versions are removed rather than rotated.
rotate_count: 50
# Log files are rotated only if they grow bigger than log_rotate_size bytes. If size is followed by k, the size is assumed to be in kilobytes.
# If the M is used, the size is in megabytes, and if G is used, the size is in gigabytes. So size 100, size 100k, size 100M and size 100G
# are all valid.
rotate_size: 200M
# The directory on your host that store log
location: /var/log/harbor
# Uncomment following lines to enable external syslog endpoint.
# external_endpoint:
# # protocol used to transmit log to external endpoint, options is tcp or udp
# protocol: tcp
# # The host of external endpoint
# host: localhost
# # Port of external endpoint
# port: 5140
#This attribute is for migrator to detect the version of the .cfg file, DO NOT MODIFY!
_version: 2.12.0
# Uncomment external_database if using external database.
# external_database:
# harbor:
# host: harbor_db_host
# port: harbor_db_port
# db_name: harbor_db_name
# username: harbor_db_username
# password: harbor_db_password
# ssl_mode: disable
# max_idle_conns: 2
# max_open_conns: 0
# Uncomment redis if need to customize redis db
# redis:
# # db_index 0 is for core, it's unchangeable
# # registry_db_index: 1
# # jobservice_db_index: 2
# # trivy_db_index: 5
# # it's optional, the db for harbor business misc, by default is 0, uncomment it if you want to change it.
# # harbor_db_index: 6
# # it's optional, the db for harbor cache layer, by default is 0, uncomment it if you want to change it.
# # cache_layer_db_index: 7
# Uncomment external_redis if using external Redis server
# external_redis:
# # support redis, redis+sentinel
# # host for redis: <host_redis>:<port_redis>
# # host for redis+sentinel:
# # <host_sentinel1>:<port_sentinel1>,<host_sentinel2>:<port_sentinel2>,<host_sentinel3>:<port_sentinel3>
# host: redis:6379
# password:
# # Redis AUTH command was extended in Redis 6, it is possible to use it in the two-arguments AUTH <username> <password> form.
# # there's a known issue when using external redis username ref:https://github.com/goharbor/harbor/issues/18892
# # if you care about the image pull/push performance, please refer to this https://github.com/goharbor/harbor/wiki/Harbor-FAQs#external-redis-username-password-usage
# # username:
# # sentinel_master_set must be set to support redis+sentinel
# #sentinel_master_set:
# # db_index 0 is for core, it's unchangeable
# registry_db_index: 1
# jobservice_db_index: 2
# trivy_db_index: 5
# idle_timeout_seconds: 30
# # it's optional, the db for harbor business misc, by default is 0, uncomment it if you want to change it.
# # harbor_db_index: 6
# # it's optional, the db for harbor cache layer, by default is 0, uncomment it if you want to change it.
# # cache_layer_db_index: 7
# Uncomment uaa for trusting the certificate of uaa instance that is hosted via self-signed cert.
# uaa:
# ca_file: /path/to/ca
# Global proxy
# Config http proxy for components, e.g. http://my.proxy.com:3128
# Components doesn't need to connect to each others via http proxy.
# Remove component from `components` array if want disable proxy
# for it. If you want use proxy for replication, MUST enable proxy
# for core and jobservice, and set `http_proxy` and `https_proxy`.
# Add domain to the `no_proxy` field, when you want disable proxy
# for some special registry.
proxy:
http_proxy:
https_proxy:
no_proxy:
components:
- core
- jobservice
- trivy
# metric:
# enabled: false
# port: 9090
# path: /metrics
# Trace related config
# only can enable one trace provider(jaeger or otel) at the same time,
# and when using jaeger as provider, can only enable it with agent mode or collector mode.
# if using jaeger collector mode, uncomment endpoint and uncomment username, password if needed
# if using jaeger agetn mode uncomment agent_host and agent_port
# trace:
# enabled: true
# # set sample_rate to 1 if you wanna sampling 100% of trace data; set 0.5 if you wanna sampling 50% of trace data, and so forth
# sample_rate: 1
# # # namespace used to differentiate different harbor services
# # namespace:
# # # attributes is a key value dict contains user defined attributes used to initialize trace provider
# # attributes:
# # application: harbor
# # # jaeger should be 1.26 or newer.
# # jaeger:
# # endpoint: http://hostname:14268/api/traces
# # username:
# # password:
# # agent_host: hostname
# # # export trace data by jaeger.thrift in compact mode
# # agent_port: 6831
# # otel:
# # endpoint: hostname:4318
# # url_path: /v1/traces
# # compression: false
# # insecure: true
# # # timeout is in seconds
# # timeout: 10
# Enable purge _upload directories
upload_purging:
enabled: true
# remove files in _upload directories which exist for a period of time, default is one week.
age: 168h
# the interval of the purge operations
interval: 24h
dryrun: false
# Cache layer configurations
# If this feature enabled, harbor will cache the resource
# `project/project_metadata/repository/artifact/manifest` in the redis
# which can especially help to improve the performance of high concurrent
# manifest pulling.
# NOTICE
# If you are deploying Harbor in HA mode, make sure that all the harbor
# instances have the same behaviour, all with caching enabled or disabled,
# otherwise it can lead to potential data inconsistency.
cache:
# not enabled by default
enabled: false
# keep cache for one day by default
expire_hours: 24
# Harbor core configurations
# Uncomment to enable the following harbor core related configuration items.
# core:
# # The provider for updating project quota(usage), there are 2 options, redis or db,
# # by default is implemented by db but you can switch the updation via redis which
# # can improve the performance of high concurrent pushing to the same project,
# # and reduce the database connections spike and occupies.
# # By redis will bring up some delay for quota usage updation for display, so only
# # suggest switch provider to redis if you were ran into the db connections spike around
# # the scenario of high concurrent pushing to same project, no improvement for other scenes.
# quota_update_provider: redis # Or db
预配置和部署
[root@kylin01 harbor]# ./prepare && ./install.sh
.......
[Step 5]: starting Harbor ...
[+] Running 10/10
✔ Network harbor_harbor Created 0.1s
✔ Container harbor-log Started 0.3s
✔ Container harbor-portal Started 0.6s
✔ Container registry Started 0.6s
✔ Container harbor-db Started 0.6s
✔ Container redis Started 0.6s
✔ Container registryctl Started 0.6s
✔ Container harbor-core Started 0.8s
✔ Container harbor-jobservice Started 1.0s
✔ Container nginx Started 1.0s
✔ ----Harbor has been installed and started successfully.----
查看harbor状态
[root@kylin01 harbor]# docker compose ps
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
harbor-core goharbor/harbor-core:v2.12.3 "/harbor/entrypoint.…" core About a minute ago Up About a minute (healthy)
harbor-db goharbor/harbor-db:v2.12.3 "/docker-entrypoint.…" postgresql About a minute ago Up About a minute (healthy)
harbor-jobservice goharbor/harbor-jobservice:v2.12.3 "/harbor/entrypoint.…" jobservice About a minute ago Up About a minute (healthy)
harbor-log goharbor/harbor-log:v2.12.3 "/bin/sh -c /usr/loc…" log About a minute ago Up About a minute (healthy) 127.0.0.1:1514->10514/tcp
harbor-portal goharbor/harbor-portal:v2.12.3 "nginx -g 'daemon of…" portal About a minute ago Up About a minute (healthy)
nginx goharbor/nginx-photon:v2.12.3 "nginx -g 'daemon of…" proxy About a minute ago Up About a minute (healthy) 0.0.0.0:80->8080/tcp, :::80->8080/tcp
redis goharbor/redis-photon:v2.12.3 "redis-server /etc/r…" redis About a minute ago Up About a minute (healthy)
registry goharbor/registry-photon:v2.12.3 "/home/harbor/entryp…" registry About a minute ago Up About a minute (healthy)
registryctl goharbor/harbor-registryctl:v2.12.3 "/home/harbor/start.…" registryctl About a minute ago Up About a minute (healthy)
2.3 helm安装高可用harbor
2.3.1 环境配置
Kubernetes cluster 1.10+
Helm 2.8.0+
Highly available ingress controller
Highly available PostgreSQL 9.6+
Highly available Redis
PVC that can be shared across nodes or external object storage
2.3.2 架构
Harbor 的大部分组件现在都是无状态的,所以我们可以简单地增加 Pod 的副本数,确保组件分布到多个 Worker 节点上,并利用 K8S 的 Service 机制来保证 Pod 之间的连通性。
至于存储层,期望用户为应用程序数据提供高可用的 PostgreSQL 和 Redis 集群,以及用于存储图像和图表的 PVC 或对象存储。
2.3.3 helm chart 安装
helm repo add harbor https://helm.goharbor.io
helm fetch harbor/harbor --untar
2.3.4 配置values.yaml
配置 ingress urlexpose.ingress.hosts.core
配置 external url
配置 external postgresql
将
database.type
设置为external
,并在database.external
部分填写信息。需要创建一个空数据库,默认情况下数据库设置为
registry
,但可以通过设置coreDatabase
来更改。
配置 external redis
将
redis.type
设置为external
,并在redis.external
部分填写信息暂不支持TLS或Redis Cluster
配置 storage
建议使用支持以
ReadWriteMany
方式跨节点共享的StorageClass
来配置用于存储图像、图表和作业日志的卷,这样可以按需扩展组件。如果此类卷类型不是您的默认 StorageClass,则需要在以下位置进行设置:persistence.persistentVolumeClaim.registry.storageClass
persistence.persistentVolumeClaim.chartmuseum.storageClass
persistence.persistentVolumeClaim.jobservice.storageClass
.
如果使用这样的
StorageClass
,则需要将以下字段的相关 accessMode 设置为ReadWriteMany
:persistence.persistentVolumeClaim.registry.accessMode
persistence.persistentVolumeClaim.chartmuseum.accessMode
persistence.persistentVolumeClaim.jobservice.accessMode
或者,通过设置以下方式使用现有的 PVC 来存储数据:
persistence.persistentVolumeClaim.registry.existingClaim
persistence.persistentVolumeClaim.chartmuseum.existingClaim
persistence.persistentVolumeClaim.jobservice.existingClaim
最后,如果您没有支持
ReadWriteMany
StorageClass 或者不希望使用,可以使用外部对象存储来存储图像和图表,并将作业日志存储在数据库中。要启用外部对象存储,请将persistence.imageChartStorage.type
设置为您想要的值,并填写相应的部分,并将jobservice.jobLogger
设置为database
配置replicas
Set
portal.replicas
,core.replicas
,jobservice.replicas
,registry.replicas
,chartmuseum.replicas
, ton
(n
>=2).
2.3.5 安装
helm install my-harbor harbor/
my-harbor 为发布时的名称,可自定义
参考链接: