We constantly evolve our Fast Data product; making enhancements, adding features, fixing bugs or supporting new Confluent Platform releases. Both updates and upgrades can be challenging, especially in a production environment. In this section you will find advise that applies to all updates and upgrades of Fast Data CSD and the accompanying parcel (FAST_DATA_KAFKA_CONFLUENT) as well as specific update instructions for the version you update to.
As an update we define any new release that does not require a special procedure to be followed by the cluster administrator. We restrict updates to the same Confluent Platform major version and releases that do not alter the way the Fast Data service is configured and operates.
As an upgrade we define any new release that may require a special procedure to be followed by the cluster administrator, add new, incompatible features, or need the Kafka developers to update their code.
Both updates and upgrades will cause Cloudera Manager to ask for a Fast Data service restart. Whilst you can perform a rolling restart, for a production environment you may prefer to avoid such actions, especially if your clients can’t handle gracefully a temporary broker loss. It is up to the cluster administrator to decide if the update is needed and can justify a service restart.
We keep a detailed Changelog with the changes of each update/upgrade that you can consult before deciding on an update.
Throughout this document we advise for a rolling restart of the brokers. Ideally the administrator will only try to restart one broker at first, in order to detect any problems. Should the broker restart successfully, the rest of the brokers can be restarted by Cloudera Manager’s rolling restart. If a problem is detected, the update or upgrade should stop and the problem be fixed. If the cluster is setup for high availability, this method will minimize the possibility of downtime.
For the CSD updates may contain bug fixes or enhancements. Unless specified otherwise a rolling restart of your brokers and then the rest of your services is all that is required.
To update the CSD, the cluster administrator should copy the new version to the cluster (check Install the CSD to Cloudera Manager), delete the old version and restart Cloudera Manager.
Then the administrator should proceed to (rolling) restart the brokers and then the rest of the roles. A non-rolling restart is also possible but it will lead to cluster downtime. Please first restart one broker to check for potential problems.
For upgrades, the procedure is similar to updates with the difference that, after the new CSD is installed and Cloudera Manager restarted, the administrator has to follow the specific instructions provided before restarting the brokers and the rest of the roles.
If there is a major update to Kafka (e.g from 0.10.1.x to 0.10.2.x), the minimum procedure usually is to pin the inter broker protocol version and the log message format version, restart the brokers, unpin the inter broker protocol version and if possible upgrade the clients, then restart again the brokers and other services. It is the standard procedure described in both Kafka and Confluent documentation. Fast Data lets the administrator perform it from a central point, through a web interface and with high supervision.
For the parcel updates may contain minor software updates or enhancements in the parcel installation. Upgrades contain major software updates (e.g from Kafka 0.10.1.x to Kafka 0.10.2.x).
Usually updates show automatically in Cloudera Manager. At a time of convenience, the cluster administrator may proceed to download, distribute and activate the new parcel. Once finished a rolling restart of the brokers and then the rest of the services should be performed.
Parcel upgrades are possible only when the CSD is upgraded as well. A parcel upgrade means that the new parcel version is incompatible with the current CSD version.
Such parcel updates will only show in Cloudera Manager after the CSD is upgraded and the instructions about the CSD upgrde should be followed.
This is a major upgrade in two ways: (a) the Kafka version is updated and (b) the CSD setup and control scripts feature significant changes.
Below are detailed steps for the upgrade. We encourage all clients to contact us beforehand. We can walk through the upgrade process together or keep an engineer on standby to assist should any issues arise.
This is the process that the cluster administrator should follow in order to perform a rolling upgrade (no downtime). It is more involved than usual due to the changes in CSD which align the configuration more closer to Kafka.
- First you should write down any configuration option that you have changed from the defaults. You can visit the Kafka (Fast Data) service in Cloudera Manager and, in the configuration tab, use the right side bar filter “Non-default” to see any settings you have changed. Please write them down. You can ignore the “broker.id” and “id” settings that are usually created automatically. If you have setup mirrormaker roles, you have to delete them. For this release we don’t include mirrormaker. The tool is still in the package, so you can use it from your command line if needed.
- The next step is to replace the old CSD with the new one in your Cloudera Manager server. Normally for this, you copy the new CSD at “/opt/cloudera/csd/” and delete the old one. More info may be found at Install the CSD to Cloudera Manager. This would be a good chance to also replace the Fast Data Tools CSD with the updated version. The older versions of the Tools CSD (1.2 or older) are still compatible but version 1.3, which only works with the Fast Data 3.2, is significantly better.
- Next restart Cloudera Manager, so it can load the new CSDs. This won’t affect the operation of the cluster.
- Once the Manager is restarted, the Fast Data parcels have to be updated. The Manager will suggest to restart the Fast Data (and the Tools) service but you must update the parcels first. Visit the parcels’ page in Cloudera Manager and you should see updates for FAST_DATA_KAFKA_CONFLUENT (please install 3.2.2-oss-rc-0) and FAST_DATA_TOOLS (please install 1.3-kt0.9.2-sc0.9.1-kc0.9.2-release-0). Download, distribute and activate the new parcels but do not restart any services yet. If you use Stream Reactor, you have to add the newer parcel repository and update the activated connectors to the version build for this release of Kafka (3.2.2). Please check Stream Reactor Connector Parcel for more info.
- If in step 1 you noticed some settings that didn’t had their default value, you should now check again your Fast Data service and if needed re-apply them. This step isn’t normally needed (and we do apologize for the inconvenience), but we moved some settings in order to make the CSD better. The settings that moved have to do mainly with the ports, SSL and Kerberos. So if you let these at their default values, probably you are good.
- Next, in the settings, please find the option
inter.broker.protocol.versionand set it to
0.10.1. This is the protocol that your brokers are using now. Also a good idea, is to set the
0.10.1. Once you do that, please restart just the brokers (from the instances tab) one by one. It is better to do it manually instead of choosing the rolling restart option, so if one broker fails you can stop the process and fix the issue. If a broker fail, try to start it again.
- Once all the brokers are restarted, please clear the option
inter.broker.protocol.version, so the brokers can use the new protocol. Also please find the option
consumer.request.timeout.msand set it to 30 seconds. This is a workaround due to a bug in REST Proxy. Without it, Kafka Topics UI and other Kafka REST clients may not work correctly (i.e will display some topics as empty). The option
log.message.format.versionshould be ideally cleared, so the brokers will switch to the default (newest) message format version, when most of your clients support the new protocol (
0.10.2). The Fast Data roles (Schema Registry, Kafka Connect, Kafka REST) do support the new protocol of course. Your custom code may have to be compiled against the latest Kafka libraries in order to support it. Please note that older clients can work even when the
log.message.format.versionis newer but there will be a performance hit, thus if your cluster isn’t used at near full capacity, you may never set this option.
- Now again restart the brokers one by one, so they can get the new setttings and then restart the other roles of your Kafka (Fast Data) service: Schema Registry, REST Proxy and Connect. For these you should be able to use the rolling restart option safely. We have noticed that with Confluent Platform 3.2.2 sometimes after the upgrade, some Schema Registry instances may fail to start. Start them again if needed.
- Last restart the Fast Data Tools service. This shouldn’t need any adjustment, just restart it.