Skip to main content

· 6 min read
Idriss Neumann

In this blog post, I'll try to explain why we moved from ElasticStack to Quickwit and Grafana and why we choosed it over other solutions.

First, we've been in the observability world for quite some time and have been using ElasticStack for years. I personally used Elasticsearch for more than 10 years and Apache SolR before for logging and observability usecases even before Elasticsearch's birth!

We also succeed to use ElasticStack for IoT (Internet of Things) projects and rebuilt our own images of Kibana and Elasticsearch for ARM32 and ARM64 before Elastic (the company) starts to release official images. We had a lot of fun with it.

rpi-elastic

However everyone who works with it on premises know that Elastic is a big distributed system which brings everyone lot of struggles such as:

  • The log retentions because it's on filesystem and storage on disk is expensive1
  • Like most of highly distributed databases developed in Java, it has a very high footprint, consumes a lot of RAM...
  • You have also some issue such as "split brains" when you're dealing with HA (High Availability)

On the other hand, there's SaaS (Software as a Service) observability solutions such as Datadog or Elastic cloud which are saving you the trouble of managing clusters but which are very expansive. And even putting the price aside, most of our customers are required to keep all the data on an infrastructure they own.

That been said, Grafana proposed an alternative which is called Grafana Loki which is storing the data on object storage. The idea of using object storage is great because it's often implementing HA by design on most of the big cloud players and it lower the price a lot. Moreover, even when you're on premises, you often want to only ensure the HA of fewer components, the object storage amongs them.

However we weren't convinced because Loki ain't implemented a real search engine such as Apache Lucene used by both Elasticsearch and SolR. It also appears to be very slow as well with bad feedbacks from the community such as this one.

So we were looking for a solution who combines the advantages of both worlds: an efficient search engine which compensates the slowness brought by the use of the object storage's API.

And yet we discovered Quickwit \o/.

quickwit-gui

Quickwit is built on top of Tantivy which is similar to Lucene but written in Rust2, and also store the indexed data on object storage. That's the main reason making Quickwit better than Loki3 and Elasticsearch in my opinion.

Quickwit is also bringing lot's of integration with the CNCF ecosystem4:

  • A datasource for Grafana
  • OpenTelemetry interoperability for traces and logs ingestion
  • Jaeger's GRPC API interoperability which allows us to use Quickwit as a storage backend for traces and keep the Jaeger UI or Jaeger datasource on grafana. This is the only known solution to store Jaeger traces on object storage
  • Elasticsearch or Opensearch5's API interoperability
  • Falcosidekick which can use Quickwit as an output
  • Glasskube which makes easier the Quickwit's installation on Kubernetes6

quickwit-gui

That's why we decided to propose Quickwit as our main observability solution in cwcloud DaaS (Deployment as a Service) platform. You can checkout this tutorial to get more informations.

quickwit-cwcloud

Moreover, we also started to migrate most of our customers infrastructures to Quickwit instances and recommand to design their new applications with the OpenTelemetry's SDK available in their stack when it's possible or use Vector from datadog which is bringing lot of advantages as well:

  • It's very fast and has a very low footprint comparing to some other well-known solutions such as Fluentbit, Logstash and even Filebeat from ElasticStack (probably because it's written in Rust :p ).
  • It provides a very powerful VRL (Vector Remap Language) language in order to remap your logs and make-it compliants with some already existing indexes mapping7.
  • It's working with Kubernetes but also with docker and even logs written on filesystem by legacy applications. And this is very convenient for us because as explained in my previous blog post Docker in production, is it really bad?, we have lot of customer who are using docker in production (through cwcloud's DaaS) instead of Kubernetes.

For most of them as for our own internal use, we have divided the compute consumption at least by 3 while increasing the retention. Larger companies successfuly created astronomical logging service with Quickwit such as Binance with 100PB of stored data.

So now Quickwit is covering our observability needs in terms of logs and traces but we still miss the metrics. For the metrics usecase we're using VictoriaMetrics which is working pretty well but lacks the support of object storage. We know that Quickwit plans to handle this usecase one day with a real TSDB (Time Series Database) which sounds really promising. I'm quite convinced that separating the compute from the storage and propose object storage is now a success key factor for building modern observability solutions.

To conclude, I still think ElasticStack is a great product with a bigger company behind which is providing more advanced features including AI (Artificial Intelligence) capabilities. I might still offer it to some customers who might be interested by some of those features or even using Elasticsearch as a full-text search engine as a dependancy of some applications or microservices (Quickwit isn't the best choice in this case, it's more suitable for observability usecases only).


  1. We know that Elasticsearch is providing object storage compatibility with the searchable snapshot feature but it's not available in the opensource version on one hand, and only recommanded on cold data which are not supposed to be fetch too much on the other hand.
  2. Tantivy is 2x faster than Lucene according to this benchmark, this compensate the slowness brought by the use of the object storage.
  3. Quickwit also provides this benchmark with Loki, trying to make a faire comparison.
  4. I'm involved myself to contribute to lot of them, missioned by Quickwit Inc. (the company).
  5. OpenSearch is a fork of ElasticStack initiated by Amazon AWS.
  6. I wrote a blog post directly on the Quickwit's blog if you want to get more informations.
  7. You see an example of remap function in order to make the docker logs compliant with the default otel-logs-v0_7 index in this tutorial.

· 8 min read
Idriss Neumann

Since the rise of Kubernetes (or K8S) and the OCI (Open Container Initiative) which standardizes the containerization on Linux, we can read more and more often that using docker1 as a runtime on production infrastructures is becoming a poor choice.

In this blogpost, I'll focus on answering to the criticisms that come from people from the containerization world, who are mainly convinced that K8S is the only viable way to deploy on production. There are also criticisms that come from people who are opposed to the principle of containerization itself. I'll probably answer to those another day.

In my previous blogpost Kubernetes or not, that's the question I already detailed how K8S and its ecosystem is lowering the deployment complexity, taking care of many things by design (autoscaling, reconciliation loops, make the observability easier by design...) and beeing the most standard IaaS (Infrastructure as a Service) API specification available everywhere (on premises, on almost every cloud providers...). It's sounds like the perfect fit to setup a real deployment platform that is making the deployment very easy and seemless everywhere. Adding some tools like teleport or knative it might completely become a real PaaS (Platform as a Service) and the SRE (System Reliability Engineers) operating those clusters can be seen as Platform Engineers.

So I ain't try to convince anybody to avoid going with K8S especially when it's a matter of building a new modern platform at a company scale or providing a multitenant service. I'm pretty convinced myself it's probably the better choice nowadays. That's also why we are providing a K8S version of our DaaS2 (Deployment as a Service) solution.

That been said, if we take a few steps back, we can see multiple advantages using docker and especially docker compose in 2024.

On one hand, lot of business, regardless of their size, have already running applications on virtual servers or compute engines. It might be a first step to start by containerizing their applications and switch from a process orchestrators such as systemd or pm2 to an OCI runtime like docker or containerd. The lift and shift once all apps are containerized to move to other infrastructures such as K8S but also CaaS (Container as a Service) like ECS on AWS or Cloudrun on GCP will be easier. It's basically a "Divide and Conquer" strategy. In my experience, telling those people from the beginning to not use a containers runtime on their existing machines might discourage them to start a migration despite the benefits.

On the other hand, the compose syntax can also be seen as another standard API specification in my opinion, such as the K8S's one. It's just doesn't handle as many thing as K8S. However it might be sufficient for lot of customers and it's by far more known by most of the developers.

Few years ago, during a DevoxxFR event, I heard someone say:

Docker was designed by developers in order to let them deploy their apps in production, K8S is the answer of sysadmin trying to take the control of the production's back

It was completely true, now the new generation of sysadmins who want to keep the control are called SRE. It's not completely the same mindset of Platform Engineer who want to give the control to the feature's teams. So maybe the Platform Enginners should provide an API standard which is easier, and for me compose is a really good candidate.

Moreover this idea isn't new. That's why the kompose exists since several years, and now Docker, Inc. (the company) is working on an experimental compose bridge project3. Docker, Inc. is also working to enrich the compose specification for years taking care of lot of production requirements such as healthchecks. So in my opinion this specification is far from beeing a local tool for developers only.

Polyglotism in deployment APIs is clearly a success factor for a PaaS (built on top of K8S or not) in my opinion: the more it provides several deployment APIs known by people, the more it meets everyone's needs. Exactly the same way the more programing languages and developer experience a FaaS (Function as a Service) is providing, the more it meets everyone's needs.

That been said, you might say:

Okay using the compose specification on K8S is fine. But you were talking about using the docker engine on virtual machines. And this still ain't bringing our expected PaaS, CaaS or FaaS platform, unlike K8S.

It sounds true, because using docker in virtual machines will requires to configure and secure the virtual machine with system administrators advanced knowledges, such as configuring the firewalling rules (using iptables, ufw, firewalld whatever), configuring a reverse proxy/load balancer in front of docker, configuring the system users and their privileges, enforce the SSH connection policies... Of course docker runtime can take cares everything about the resiliency of a single process, like systemd but all the rest remains.

Indeed, it appears that if you want to stay "modern" (auditable, gitops, using some Infrastructure as code, beeing able to rollback a change with a git revert, etc) you'll have to use terraform/opentofu/pulumi/whatever to provision the infrastructure, you'll have to setup ansible4 to configure the virtual machines... and that's too much work comparing to using a K8S managed cluster with helmcharts and gitops tools like ArgoCD or FluxCD.

However this work can be optimized with a DaaS platform such as cwcloud exactly the same way you are mutualizing your helmcharts and using umbrella charts to install a tenant of your application and its dependancy. We're providing a tool where you can templatize your "environments" (or deployments) using a pretty easy GUI or CLI. Here's an example for a templatized Wordpress installation:

cwcloud-env-wordpress-1

cwcloud-env-wordpress-2

Once you've done your set of ansible roles and the injected variables and documentation's template, it'll take only one API or CLI call (or even a single clic on the GUI) to instanciate a virtual machine and perform the complete installation with a git repository containing all the ansible configurations and which will triggers update pipelines in case of change (in a modern gitops approach).

From a developer perspective, they just have to provide templates of their compose files inside an ansible role and re-use the other roles already developed and maintained by your engineering platform team. It starts to look like the way the platform engineers building their platform on top of K8S are working, right?

Okay that's really promising but still ain't seemless as a CaaS where the developer can also access to the pods... like we're doing with teleport on top of K8S or a CaaS based on knative such as Cloudrun.

There's an underrated quickwin to acheive this very easily: portainer. All it takes to have a modern platform with a nice GUI to manage all your containers on your virtual machines is a lightweight agent to run on those.

portainer-containers.png

portainer-shell.png

That's why we're proposing it with cwcloud to some of our customers. You can watch this demo to understand how you can easily transform your infrastructure built on top of virtual machines and docker into a real CaaS platfom using this combo5:

portainer_agent_demo

Portainer is also working with K8S which makes the lift and shift approach a lot easier.

To conclude, we like working with everyone answering their needs and we also like K8S very much (I already said it multiple times). Some of our customers are using K8S, some of them are perfectly fine with compute engine with a docker runtime. For example, we have customers with multitenant applications who wants to bill their own users with their cloud usage. It's more convinient this way because each customer is paying for its own compute instances instead of doing complicated FinOps with shared K8S clusters6. We have also customer who requires to have a seggregation with the data and network of their different tenants.

So yes it's still fine in 2024 to work with docker in production, you just have to find a way to align with the state of the art and modern cloud and DevOps practices :)


  1. In this blogpost, I'll refer only to the docker engine which is opensource and not Docker Desktop which isn't and manages many other things to help developers (Linux virtualization using QEMU to help handling microprocessor architectures interoperability...).
  2. You can checkout this tutorial to understand how DaaS is working with cwcloud and what's the difference between IaaS, PaaS and DaaS.
  3. This is pretty promising and unlike kompose you can develop your own mapping rules to convert your compose files into K8S manifests which will have the shape you want (it's kinda using helm to read the compose file as a value file if you want my opinion, with many helpers that make it easier). It was presented by Guillaume Lours and Nicolas De Loof from Docker, Inc. at the last DevoxxFR 🇫🇷.
  4. I only mention ansible because I consider it won the battle over puppet, chef, salt... for most of the remaining infrastructures based on virtual machines for a long time ;)
  5. Since this demo which is two years old, our design and portainer's design has improved a lot but this is still giving an idea on how it's easy to get a real CaaS platform on top of our DaaS.
  6. Yes we could use K8S with some tools like kubecost instead. However it's easier for them to directly see their customer's names associated to the compute directly in the final cloud bills.

· 9 min read
Idriss Neumann

To cap off years of debates on Kubernetes (or K8S) is fitting for everyone or not, I will finally give my deep feeling which took me a bit of time to build after years of use.

In my case I really like K8S but I don't have a particular problem working with "traditional" infrastructures built on to of VM1 (Virtual Machines) especially because it sometimes benefits some of our customers.

Why this debate is still happening in 2024?

First, I'll try to understand why this debate is still hapening in 2024 and it's pretty simple. If I had to give an analogy for developers of using K8S vs. classic IaaS (Infrastructure as a Service) which provides VMs on demand, is comparable as working with the C programing language or a very high-level stack framework like Spring Boot.

So we had to expect this debate which comes exactly like the one of developers who consider that the framework's users are losing their skills and become proletarians vs. the framework's users who observe that they have a better business velocity. It's exactly the same debate and it's running and will continue to run for years exactly the same way.

Understanding the anti-K8S point of view

In Unix/Linux you already have everything you need to automate resilient infrastructures: command/shell interpreters and scripting languages, schedulers (cron, anacron...) and when you take these elements one by one lot of people will say to themselve "there's nothing complicated, it's very easy to use" or for some of them using it for years "why change a winning team?" and it's a an understandable point of view.

For some people who have capitalized their business for years on those technologies that aren't outdated at all, there's no return on investment in telling them to change if we are honest for a few minutes. If we take the time to think and put ourselves in those people's shoes, we should realize that we are asking them to work without any added value because they already got a high satisfaction rate with minimum of time to intervene in case of issue.

Some will even find flaws2 that can be legitimately debated in the current implementation of K8S like every technology including the Linux kernel. Particularly some cloud players which have implemented themselves a resilient control-plane model that has been working for years and hasn't been fully amortized. It's understandable that those are trying to sell their product which isn't necessarily badly done and which can still answer very well to lot of usecases on the market. However everyone should take a few steps back from the argument because no solution is perfect and in the end what matters is the pros and cons and the tradeoffs we choose to make as decision maker in order to keep the better ROI (return of investment).

Understanding the pro-K8S point of view

Now there's a new generation of sysadmin we can sometimes call "SRE" (which stands for System Reliability Engineers) or even Platform Engineers because we ask them to provide their skills and work routine As a Service with a better time to market as competitive as cloud players which are building their IaaS or PaaS (Platform as a Service) for decades. I'll try here to explain why K8S seems the perfect fit for those people.

Let's continue with the example of cron jobs on Linux. cron or anacron are great, very well known and running for years. If we ask those people to schedule some tasks with cronjobs and make them auditable, resilient in case of shutdown and even highly available, using those well known tools will require to add some structured logs with a monitoring system, implement the exponential retry in case of failure, install those crontabs on multiple servers and handle the concurrency with a semaphore/locks design, reconciliation loops...

That's starting to be a lot of things to handle to do something that seemed simple, right? And this is only one of the easiest example of things to handle when you're in charge of the reliability of your systems and it's also something that can matter for everyone including small business nowadays. It'd be a big mistake and bad judgement to think that's a too fancy consideration for small business especially when they are able to handle this for few dollars per months nowadays.

And that's the thing with K8S: it's already implemented by design without any effort, only 5 minutes of work with a single CLI or API invocation. And this only the simplest example I found but it's exactly the same for every deployment automation aspect we use to see as comodity for years.

What I'm trying to explain here is that K8S which is often seen as a distributed orchestrator for large business and which bring values only in case of lot of autoscaling requirements is a wrong perspective. K8S should be seen in fact the new generation of IaaS with a standard API which takes care of every comodity we can have regardless of the size of our business and which is working almost everywhere and avoid the vendor lock-in.

That been said, it's not because it's available almost everywhere that the pricing model or the cost to move what is already running brings the better ROI for everyone. On my side I deeply regret that there's still no serverless implementation of the K8S API in any cloud provider and that this API is still strongly coupled to a single codebase (with some flaws like we said before). The "standard API specification existing everywhere" is only theorical.

My personal point of view

You might know that we are building our DaaS (for Deployment as a Service) and FaaS (for Function as a Service) solution cwcloud because we strongly believe those are the best compromises between IaaS and PaaS. You can check-out this tutorial to get a better understanding.

We insist on continuing to provide, in an agnostic way, those services on both K8S and classical IaaS3 and don't complain about this because we consider that it's our job to adapt and try to make most of people happy because that's what brings us more business.

However we got to admit that the complexity of the DaaS implementation is reduced by far with K8S and it's noticeable just by seeing those two diagrams which present the architecture of the two implementations:

Without K8S

daas-classical-iaas

With K8S

daas-k8s

You can easily understand here which version took us more efforts ;-)

So, as a developer, I also strongly beleive that the Kubernete's API is lowering the complexity of the deployments exactly the same way a framework and runtime like Spring Boot is lowering as well the comodity we used to develop ourselves to expose our code as a microservice (http exposition of our business logic, abstraction, logs, metrics...).

As a developer, I also love coding with the C programing language as I love understand low level stuffs in Linux/Unix operating systems. I feel more powerfull and more competent with it. However as a manager I'd be a fool to try to avoid frameworks like Spring Boot to increase productivity. And that's also part of the engineering process to analyze the average velocity and ROI.

That been said, I ain't sayin that we should always pick the K8S option. Not at all, I already explained that it brings no added value for lot's of skillful people or already setup infrastructures on one hand, and also that the K8S pricing offers on most of cloud provider is still not great everywhere4 on the other hand. We still miss serverless offers where the billing is based on our pods consumption only to make people beleive that K8S isn't an orchestrator but a standard API definition to deploy everywhere with some kind of agnosticity.

Honestly, we have customers using cwcloud without K8S and it's working as great with sometimes, depending on the choosen cloud provider, a better pricing model. Like always we got to analyze the pros and cons for each of them and help them to accept some tradeoffs in order to bring the better business value possible.

And that's also the thing for companies which are looking for some high-level and uncomplex way to deploy their stack (PaaS, FaaS or DaaS): it's okay if it's running on top of K8S or not. What should only matter to you is if the SLA (Service Level Agreement) and your own velocity are good and if your provider is reliable. Let them sold you their features and not how they achieved them. And it's also okay to change your mind in the future and rebuild everything when you got more funds and more needs...


  1. I know that K8S is not related to containers anymore and can orchestrate VM (with kubevirt for example) or WASM binaries. Here I'm refering to classical IaaS or hypervisor we used to use almost everywhere before the rise of K8S and wich are still very used.
  2. For example K8S is relying on etcd which is a statefull component and which cannot be "high available" by design. But some distribution like K3S are offering the ability to replace it by something more reliable like NATs (and some cloud players has made their own rewriting of etcd).
  3. We are compliant with the IaaS API of AWS, GCP, Azure, Scaleway and OVH and Openstack for on premises infrastructures. I'm refering to those when I talk about "classical IaaS" which are providing storage and compute as VM.
  4. At the time of writing, Scaleway is offering a free shared control-plane which is pretty promising (it was unstable but getting better in my opinion). This way, you're only paying your nodes with the same pricing of any compute instances (and their pricing is very competitive, you can have a fully functional cluster for less than 40 dollars per month). But this kind of deal is not very common among the biggest cloud players. Anyway Scaleway's becoming a very great deal if you want my honest opinion :)