VMware Cloud Director 10.4.1 (VCD) has just been released and with it comes also the change of the appliance internal PostgreSQL database from version 10 to 14 (as well as replication manager upgrade). This means that the standard upgrade process has changed and is now a bit more complicated and could be significantly longer than in the past. So pay attention to this change. This article goal is to explain what is going on in the background to understand better why certain steps are necessary.
VCD 10.4.1 is still interoperable with PostgreSQL 10-14 if you use the Linux deployment format, so this blog applies only if you use the appliance deployment. PostgreSQL version 10 is no longer maintained so that is the main reason for the switch besides the fact that newer is alway better :-).
- The database upgrade will happen during the vamicli update process.
- All appliance nodes must be up but the vmware-vcd service should be shut down.
- Always recommended is to take cold snaphots of all nodes before the upgrade (make sure the snapshots are done while all DB nodes are off at the same time as you do not want to restore snapshot of primary to older state while secondary nodes are ahead)
- Primary database appliance node is where you have to start the upgrade process, it will most likely take the longest time as new database version 14 will be installed side-by-side and all the data converted. That means you will also need enough free space on the database partition. You can check by running on the primary DB node:
df -h|grep postgres
/dev/mapper/database_vg-vpostgres 79G 17G 58G 23% /var/vmware/vpostgres
The above shows that the partition size is 79 GB, the DB is currently using 17 GB and I have 53 GB free. So I am good to go. The actual additional needed space is less than 17 GB as the database logs and write ahead logs are not copied over. Those can be deducted. You can quickly get their size by running:
du -B G --summarize /var/vmware/vpostgres/current/pgdata/pg_wal/
du -B G --summarize /var/vmware/vpostgres/current/pgdata/log/
Or just use this one-liner:
du -sh /var/vmware/vpostgres/current/pgdata/ --exclude=log --exclude=pg_wal
If needed, the DB partition can be easily expanded.
- The secondary DB nodes will during the vami upgrade process clone the upgraded DB from the primary via replication. So these nodes just drop the current DB and will not need any additional space.
- DB upgrade process can be monitored by tailing in another ssh session update-postgres-db.log file:
tail -f /opt/vmware/var/log/vcd/update-postgres-db.log
- After all nodes (DB and regular ones) are upgraded, the database schema is upgraded via the /opt/vmware/vcloud-director/bin/upgrade command. Then you must reboot all nodes (cells).
- The vcloud database password will be changed to a new autogenerated 14 character string and will be replicated to all nodes of VCD cluster. If you use your own tooling to access the DB directly you might want to change the password as there is no way of retrieve the autogenerated one. This must be done by running psql in the elevated postgres account context.
root@vcloud1 [ /tmp ]# su postgres
postgres@vcloud1 [ /root ]$ psql -c "ALTER ROLE vcloud WITH PASSWORD 'VMware12345678'"
and then you must update vcd-service on each cell via the CMT reconfigure-database command. This can be done in a fan-out mode from a single cell live by running:
/opt/vmware/vcloud-director/bin/cell-management-tool reconfigure-database -dbpassword 'VMware12345678' --private-key-path=/opt/vmware/vcloud-director/id_rsa --remote-sudo-user=postgres -i
The command above will change DB configuration properties on the local and all remote cells. It will also refresh the running service to use the new password.
Note the DB password must have at least 14 characters.
- Any advanced PostgreSQL configuration options will not be retained. In fact they may be incompatible with PostgreSQL 14 (they are backed up in /var/vmware/vpostgres/current/pgdata/postgresql.auto.old)
3 thoughts on “VMware Cloud Director 10.4.1 Appliance Upgrade Notes”
root@atl1-kepler-vcd1 [ ~ ]# su – postgres
postgres@atl1-kepler-vcd1 [ ~ ]$ psql -c “ALTER ROLE vcloud WITH PASSWORD ‘Welcome@12345678′”
psql.bin: warning: extra command-line argument “WITH” ignored
psql.bin: warning: extra command-line argument “PASSWORD” ignored
psql.bin: warning: extra command-line argument “‘Welcome@12345678′”” ignored
psql.bin: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: FATAL: Peer authentication failed for user “vcloud”
If you use special characters bash make break the psql command. There is probably a way to escape them, but easier is just run psql without anything and then run the quoted command in the psql shell.
I’ve been looking forward to this release as I have a cell group that I cannot upgrade due to some famous bug with the care packages. I attempted upgrade on release, and now I am suffering from another bug with JWT, so the cell’s won’t start.
Since this is another bug, I have to wait infinitely for the development group to repair and that further puts me in the throws of the NSX-T migration needs. I’ve already waited 2 months for the 10.4.1 release.
I’m sad. Ticket open with support. Just overly frustrated with the quality of releases at this time.
I really enjoy what you post sir. Tremendously great information for my team as a whole. Thank you!