Our Java Open Source Contributions to Presto & Airlift

Written by Cegal Tech Advisors | Oct 13, 2020 8:54:00 AM

Here I will describe a couple of key contributions we developed and merged into the Presto & Airlift codebases to be able to incorporate these technologies into our project fully.

This is a follow-up to a previous post describing the work we are doing toward constructing a state-of-the-art data platform from the ground up:

Take a look at: The previous post

Contributing to an Open Source Project

In my opinion, every time you decide to incorporate an open source component into an enterprise system, it is a very good practice to not only read through the documentation and examples thoroughly but also look closely into the codebase where you can easily spot a few important things:

Is the project alive and kicking?, meaning that contributions, commits, resolved issues, merged PRs, etc., are still happening at a decent frequency over the last few months
Code quality: is the project encouraging developers to comply with standards, naming conventions, and healthy practices?, does it provide any guidelines and a reasonable code of conduct?
Testing: Is there a CI/CD pipeline with automated testing that is always triggered when adding or modifying code (e.g., Gtihub Actions)?, does it include regression testing and other types of useful probes?, when problems and errors occur during automated testing, it this being reported in a human-readable way which facilitates correction?
Release lifecycle: Is there a sound strategy for packaging and releasing new versions?, for example having Alfa / Beta releases besides the latest stable package, which can be deployed and perfected through the community’s feedback.
Code owner engagement: Are some of the key developers of the project available for discussions, code review, issue tracking, etc?, is there a way to have a deeper technical conversation with them in case it becomes necessary (e.g., a Slack team)?

The more boxes you check when looking at the codebase, the more trust you can have in the project and its viability to become a vital part of your own system. Furthermore, it is important to make sure that if / when needed: small adjustments can be made, bugs can be corrected, patches can be issued, additional features which make sense can be developed, documentation can be improved, and so on in an iterative & agile fashion.

That being said, you can always take the next step and actually contribute to the project. Even the smallest form of collaboration can make a big difference and benefit the whole community, for example: submitting improvements to the documentation, voting up and/or commenting on reported issues, reviewing relevant PRs, and of course, writing code for new features, bug fixes, patches, etc.

Almost in any case, to contribute efficiently, it is useful to have a good grip on the Git flow and concepts such as squashing, rebasing, and so on.

In the case of Airlift & Presto, these are very well-structured Java projects (especially Presto), where most things are highly automated through bots and actions, from signing the CLI to getting your stuff thoroughly tested via pre-baked docker compose scenarios which include not only a single instance topology but also production grade configurations including critical dependencies such as Hive, HDP, etc. All of this makes it super easy to focus on the task at hand and move along the pace until the code is ready for review and promotion from the maintainers.

The two contributions described in this blog post were a bit challenging to get on an official release because they meddle with the core functionality of the products, so there was a lot of back and forth to make sure that whatever breaking changes we were introducing could be easily managed through configuration and didn’t create major issues for current users of these technologies.

Adding a Pluggable Certificate Authenticator to Presto

Link to the original PR

Our PR addressed the following concerns:

Related to Prestosql issue 2117 to some extent
Allows customization of the “CERTIFICATE” authentication method by registering plugins (similar to PasswordAuthenticator)
This can be useful in many cases when some additional security checks must be done using the certificate’s data (serial number, subject alternative names, etc.) besides only obtaining the principal
Our specific use case is service mesh native integration (e.g., Consul Connect), where once mutual TLS is established at a JKS level, the serial number & SPIFFE Id must be obtained and validated for authorization by using an API call to a service agent.

This functionality was made available from release 334 after the PR was approved and merged.

Making SSL Hostname Verification Configurable for Airlift’s Embedded Jetty

Link to the original PR

Embedded Jetty allows enabling /disabling SSL hostname verification through config mechanisms; however, within Airlift, the “setEndpointIdentificationAlgorithm” method is hardcoded with an “HTTPS” value, which ensures hostname verification will always be performed.
As a result of this, any system which uses airlift as a foundation for internal communication won’t have any control over this, even if it configures its own trust manager or uses the available JVM property on startup (-Djdk.internal.httpclient.disableHostnameVerification)
This is a very simple change adding a config property that allows airlift users to leverage embedded Jetty’s configuration possibilities.
The justification is that there are use cases that require disabling the hostname verification for SSL (or don’t really need it) without bypassing the whole certificate chain trust process.
A good example is a system or application which subscribes to the SPIFFE / SPIRE standard, where certificate chain trust is enforced. Still, vanilla hostname verification (based on certificate CN) is not necessary as the SPIFFE URI in the certificate’s SAN is a much more effective and flexible way to prevent the man-in-the-middle attack.

This functionality is available from release 198.

Key Takeaways

We really learned a lot from making these contributions, and it was very satisfying to see them finally make it into the main codebases and official releases; this is, after all, technology that was originally developed in the big leagues (Facebook), so it makes us really proud to be able to make an impact at such a level of innovation.
Getting involved with the technology we’re using in such a way has given us a huge boost in our own innovation journey, both because we got the functionality we needed and also because now we have a much better understanding not only of the tools but also of our own system. Furthermore, we were able to incorporate some of the nice development practices we learned from Presto & Airlift into our own toolset.
On a personal note, I truly believe that this is the right way to go about building modern systems in this day and age; with such a competitive landscape, so many options at our disposal, increasingly challenging requirements, and especially the need for dramatically reducing time-to-market, we can hardly afford to get stuck in the old ways and impose constraints on ourselves which will inhibit collaboration and the ability to automate, iterate and improve continuously.

Thanks for reading!!

View full post