This is a follow-up to a previous post describing the work we are doing toward constructing a state-of-the-art data platform from the ground up:
Take a look at: The previous post
In my opinion, every time you decide to incorporate an open source component into an enterprise system, it is a very good practice to not only read through the documentation and examples thoroughly but also look closely into the codebase where you can easily spot a few important things:
The more boxes you check when looking at the codebase, the more trust you can have in the project and its viability to become a vital part of your own system. Furthermore, it is important to make sure that if / when needed: small adjustments can be made, bugs can be corrected, patches can be issued, additional features which make sense can be developed, documentation can be improved, and so on in an iterative & agile fashion.
That being said, you can always take the next step and actually contribute to the project. Even the smallest form of collaboration can make a big difference and benefit the whole community, for example: submitting improvements to the documentation, voting up and/or commenting on reported issues, reviewing relevant PRs, and of course, writing code for new features, bug fixes, patches, etc.
Almost in any case, to contribute efficiently, it is useful to have a good grip on the Git flow and concepts such as squashing, rebasing, and so on.
In the case of Airlift & Presto, these are very well-structured Java projects (especially Presto), where most things are highly automated through bots and actions, from signing the CLI to getting your stuff thoroughly tested via pre-baked docker compose scenarios which include not only a single instance topology but also production grade configurations including critical dependencies such as Hive, HDP, etc. All of this makes it super easy to focus on the task at hand and move along the pace until the code is ready for review and promotion from the maintainers.
The two contributions described in this blog post were a bit challenging to get on an official release because they meddle with the core functionality of the products, so there was a lot of back and forth to make sure that whatever breaking changes we were introducing could be easily managed through configuration and didn’t create major issues for current users of these technologies.
Our PR addressed the following concerns:
This functionality was made available from release 334 after the PR was approved and merged.
Embedded Jetty allows enabling /disabling SSL hostname verification through config mechanisms; however, within Airlift, the “setEndpointIdentificationAlgorithm” method is hardcoded with an “HTTPS” value, which ensures hostname verification will always be performed.
As a result of this, any system which uses airlift as a foundation for internal communication won’t have any control over this, even if it configures its own trust manager or uses the available JVM property on startup (-Djdk.internal.httpclient.disableHostnameVerification)
This is a very simple change adding a config property that allows airlift users to leverage embedded Jetty’s configuration possibilities.
The justification is that there are use cases that require disabling the hostname verification for SSL (or don’t really need it) without bypassing the whole certificate chain trust process.
A good example is a system or application which subscribes to the SPIFFE / SPIRE standard, where certificate chain trust is enforced. Still, vanilla hostname verification (based on certificate CN) is not necessary as the SPIFFE URI in the certificate’s SAN is a much more effective and flexible way to prevent the man-in-the-middle attack.
This functionality is available from release 198.
Thanks for reading!!