Kaiko
Senior Site Reliability (Onsite / Hybrid / Remote) - Europe
The Challenge
You will be joining a fast-paced engineering team made up of people with significant experience working with terabytes of data. We believe that everybody has something to bring to the table, and therefore put collaborative effort and team-work above all else (and not just from an engineering perspective).
You will be able to work autonomously as an equally trusted member of the team, and participate in efforts such as:
- Addressing high availability problems: cross-region data replication, disaster recovery, etc.
- Addressing “big data” problems: 200+ millions of messages/day, 160B data points since 2010 (currently growing at a rate of 10B per month).
- Improving our development workflow, continuous integration, continuous delivery and in a broader sense our team practices
- Expanding our platform’s observability through monitoring, logging, alerting and tracing
What you’ll doing
- Deploy, maintain, evolve our infrastructures (we have 2 autonomous regions) for optimum data consistency, availability while keeping costs down
- Automate what is not, fix what’s needed, providing ideas
- Adapt fast
Our tech stack
- Alerting: AlertManager, Karma, PagerDuty
- Logging: Vector, Loki
- Caching: FoundationDB
- Secrets management and PKI: Vault
- Configuration management and provisioning: Terraform, Ansible
- Service discovery: Consul
- Messaging: Kafka
- Proxying: HAProxy, Traefik
- Service deployment: Terraform, Nomad (plugged in Consul and Vault)
- Database systems: ClickHouse (main datastore), FoundationDB (caching, deduplication), replicated PostgreSQL
- Operating System: Ubuntu 20.04
- Protocols: gRPC, HTTP (phasing out in favor of gRPC), WebSocket (phasing out in favor of gRPC)
- Platform: containers
About You
- Significant experience as a DevOps/System Engineer
- Experienced about Linux system admin, automation (ansible at a minimum)
- Worked with, in no particular order: troubleshooting crashes & performance issues, load-balancing, VIPs/fail-over IPs, RAID
You’ll notice that we don’t have any “hard” requirements in terms of development platforms or technologies: this is because we are primarily interested in people capable of adapting to an ever changing landscape of technical requirements, who learn fast and are not afraid to constantly push our technical boundaries.
It is not uncommon for us to benchmark new technologies for a specific feature, or to change our infrastructure in a big way to better suit our needs.
The most important skills for us revolve around two things:
- What we like to call “core” knowledge: what’s a software process, how does it interact with a machine’s or the network’s resources, what kind of constraints can we expect for certain workloads, etc
- How fast you can adapt to a technology you didn’t know existed 10 minutes ago
In short, we are looking for someone able to spot early on that spending 10 days to migrate data to a more efficient schema is the better solution compared to scaling out a database cluster in a matter of minutes if we are looking to improve performance in the long term.
Nice to have
- Experience with HashiCorp tools (terraform, vault, consul, nomad)
- Experience with orchestrating containers, micro-services
- Experience with recent Ubuntu, systemd
- Knowledgeable about network, routing (BGP, static, …), tunneling
- Knowledge about encryption (PGP/TLS/SSH/WireGuard/…)
- Basic knowledge of crypto-currencies
Personal Skills
- Honest: receiving and giving feedback is very important to you
- Humble: making new errors is an essential part of your journey
- Empathetic: you feel a sense of responsibility for all the team’s endeavors rather than focus on individual contributions
- Committed: as an equally important member of the team, you want to make yourself heard while respecting everybody’s point of view
- Fluent in written and spoken English
- You have the utmost respect for legacy code and infrastructure, with some occasional and perfectly understandable respectful complaints
What we offer
- An entrepreneurial environment with a lot of autonomy and responsibilities
- Opportunity to work with an internationally diverse team
- Hardware of your choice
- Perks: meal vouchers, multiple team events and staff surprises
Process
- Introduction call (30mins)
- Meeting with members of the team for a technical/product RPG: you read that right, no written test, no whiteboard quicksort implementation (1h30)
- Cross team interviews (2-3 persons, 45m x2)
- Meeting with VP of Engineering (20m)
As our working language is English, we would appreciate it if you send us your application and any accompanying documents in English.
Location
On-site in our Paris office, or full remote (+- 2h maximum with CET).
Diversity & Inclusion
At Kaiko, we believe in the diversity of thought because we appreciate that this makes us stronger. Therefore, we encourage applications from everyone who can offer their unique experience to our collective achievements.