Category: LinkedIn

How LinkedIn is Working to Address Confusion Between Vendor Email and Phishing Attacks Throughout the Industry

What happens when an employee reports each of these messages as phishing? For one message, they might get a “thank you” or “good job” for recognizing and reporting the threat. For the other message, which appears equally suspicious to the …

Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity

Larger clusters submit more operations, and in the presence of the discussed performance regression, larger clusters also require a longer time to complete each operation. This combination results in a superlinear performance impact.

In the process of investigating our original …

NYC Engineering: Making Strides to Encourage More Women in Tech

“Winterns” with LinkedIn’s NYC Engineering department as a part of the WiTNY Winternship program

The Winterns worked on a wide range of programming and project-specific tasks. In addition to training in agile development and learning about the developer framework, they …

Lessons Learned from LinkedIn’s Data Center Journey

I recently had the pleasure of speaking to groups of data center executives and strategists at the DCD Zettastructure conference in Singapore and at the DCD Converged conference in Hong Kong. Putting my thoughts together for these talks gave me …

Trends in AI for Online Education: Recommending Micro-Content for Micro-Learning

Editor’s Note: Shivani Rao is a Senior Applied Researcher on the LinkedIn Learning Relevance team, where she works on applications of recommendation systems to online learning. She recently spoke at the RE•WORK Women in Machine Intelligence Dinner in San Francisco

Getting to Know Todd Palino

What are some of the coolest projects that you and your team have been working on?
Because Apache Kafka was originally developed at LinkedIn, we have a very strong Kafka development team. Both the Kafka development and SRE teams are …

Enabling Dual Stack on LinkedIn CDNs

Co-authors: Erin Atkinson and Bhaskar Bhowmik

A few months ago, LinkedIn surpassed the 50% IPv6 traffic milestone. In this post, we will look into the methodology we adopted to measure performance as we enabled IPv6 on our content delivery …

Gobblin Enters Apache Incubation

Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion, replication, organization, and lifecycle management, for both streaming and batch ecosystems.

Gobblin has been gobbling big data with ease in the open …

Now You See Me, Now You Don’t: LinkedIn’s Real-Time Presence Platform

Thus, during the process heartbeat step, we create this delayed trigger if it doesn’t yet exist for the member. If it already exists, we simply reset it to fire in another d + 2ε seconds. When the delayed trigger for …

Project STAR*: Streamlining Our On-Call Process

Problem statement

As with any good project, the first thing we had to do was clearly articulate the problem statement. It was obvious that there were several different problem areas within Voyager On-Call, so we sat down and came up …

Ember Timer Leaks: The Bad Apples in Your Test Infrastructure

Background: 3×3 at LinkedIn

At LinkedIn, we pride ourselves on our 3×3 system: the notion that we should be able to ship code to production three times a day, with no more than three hours between releases, so that

Xenia: A Domain-Specific Framework for Building Optimized SEO Guest Experiences

Co-authors: Ajit Datar, Reza Arbabi, and Chirag Patel

LinkedIn is a network of professionals used by more than half a billion members to meet their professional goals. However, there are even more professionals who are either not on …

Venice Hybrid: Doing Lambda Better

Over the last two years at LinkedIn, I’ve been working on a distributed key-value database called “Venice.” Venice is designed to be a significant improvement to Voldemort Read-Only for serving derived data. In late 2016, Venice started serving …

The Statistical Modeling System Powering LinkedIn Salary

Introduction

For most job seekers, salary (or, more broadly, compensation) is a crucial consideration in choosing a new job opportunity. Indeed, more candidates (74%) want to see salary information compared to any other feature in a job posting, according to …

Project Falco joins SONiC Community (Software for Open Networking in the Cloud)

The Production Engineering team at LinkedIn quickly recognized the value that SONiC and its open source community would bring as the network operating system for our Project Altair-defined architecture data center fabric.

As discussed in earlier blog posts, our …

Automating Your Oncall: Open Sourcing Fossor and Ascii Etch

One of our sayings in Site Reliability Engineering (SRE) is that the goal of your job is to “automate yourself out of the job.” While some may have concerns of being replaced by robots, SRE’s see the value of automating …

The Glimmer Binary Experience

Co-authors: Sarah Clatterbuck, Chad Hietala, and Tom Dale

A bit over a year ago, Ember.js got a major overhaul. In a tight collaboration between LinkedIn engineers and the open source community, we replaced Ember’s rendering engine with a …

Couchbase Ecosystem at LinkedIn

Core Couchbase services

Couchbase server
LinkedIn runs a mix of Couchbase Community and Enterprise Editions over our infrastructure. The deployments of Couchbase range from 3 nodes per cluster up to 72 in our largest cluster. Currently, we deploy this as …

Getting to Know Dave Herman

Before joining LinkedIn in August of 2017, Dave worked at Mozilla for about seven years. There, he founded and led the Mozilla Research department, which contributed to the creation of a number of web-related technologies, including the Rust programming language, …

Improving Resiliency and Stability of a Large-scale Monolithic API Service

How was multi-clustering achieved?
We started by partitioning the endpoints of our service. This was relatively easy to do, because endpoints in the service were already divided into separate ownership groups called “verticals.” Then, using the data collected by our …

Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now

We designed and developed an independent framework to propagate Oracle schema changes (DDLs) to Kafka. It allows near-synchronous propagation of schema changes through integration with the Oracle release framework. It also makes APIs available for on-demand invocation by downstream (e.g., …

Open Sourcing Our WomenConnect Event Framework

Erica interviewing Ya Xu, Principle Staff Engineer and Statistician

Share your personal journey
After Jeff’s conversation with Erica, I was lucky enough to join her onstage for a chat. We talked about our individual journeys and the problems we’ve personally …

Getting to Know Lauren Caponong

Prior to working in design and engineering, she obtained a degree in fine arts, with concentrations in painting, drawing, and digital art.

Why are you so passionate about video?
Bringing video to LinkedIn is rather exciting for me because video …

Resilience Engineering at LinkedIn with Project Waterbear

Our latest home page depends on more than 550 different endpoints in its dependency tree. It is very difficult for developers to ensure expected “graceful” degradation on the home page for every failure scenario involving this many endpoints. With LinkedOut, …

Dali Views: Functions as a Service for Big Data

Co-authors: Carl Steinbach and Vasanth Rajamani

Big challenges in the big data ecosystem

At LinkedIn, we have a number of challenges managing data in our complex data ecosystem. Changes to our infrastructure are often necessary to make progress, but they …

ParSeq: Asynchronous Java Made Easier

ParSeq is LinkedIn’s framework for writing asynchronous Java, and powers many of LinkedIn’s largest web services. It has proven invaluable for developer productivity, as well as essential for web service observability. ParSeq is well-adopted at LinkedIn in both the frontend …

Fixing the Plumbing: How We Identify and Stop Slow Latency Leaks at LinkedIn

So, e.g., a regression of a “high” traffic page is classified as P0 if PLT this week is above 30% of the baseline (high water mark). This regression will be resolved only when the PLT is less than 5% above …

Our Partnership with TechWomen Grows

A group of LinkedIn mentors and the Emerging Leaders hosted at LinkedIn

Areas of focus for ELs at LinkedIn

LinkedIn matched mentors with ELs based on similarities in the areas of focus. This year, our ELs wanted to focus on …

REACH Pilot Results in 80% Conversion: Making Strides in Cultivating Talent from Non-traditional Backgrounds

Our REACH program is designed to give highly determined individuals with strong technical skills the opportunity to gain on-the-job experience they need to become full-time software engineers. It also helps us hire more talent to join our engineering team. I’m …

Building Smart Replies for Member Messages

Hypothetical Dagli pipeline for smart replies. Circles represent inputs to the DAG.  Arrows connect the result of one node to the input of another.

Inference, Personalization and Diversity

When you receive a message, it’s used, together with the preceding conversation, …

Sleek and Fast: Speeding Up your Fat Web Client

Once our RUM measurements and Session-weighted p90 were established so that we would know when we were at least as good as our existing site, we were almost ready to start the hard work of becoming sleek and fast.

Knowing …

Streaming Data Pipelines with Brooklin

Datastream management API
This is a REST endpoint to create, update, manage, or delete datastream objects. It stores datastream objects in Zookeeper. At LinkedIn, we have a self-service portal called Nuage that facilitates creating and managing infrastructure resources, including Brooklin.…

Getting to Know David Max

Before joining LinkedIn in the New York City office in February 2015, he worked at Google on ad serving for about two years. Before that, he spent about 10 years working in financial technology for various financial firms, such as …

Health Score Metrics as a Software Craftsmanship Enabler

The notion of software craftsmanship is sometimes a muddy one. On the one hand, engineers find it hard to grasp and materialize craftsmanship, which is an abstract objective that, by itself, provides little guidance to the software engineering practice. On …

Open Sourcing Our Women in Tech High School Trainee Program

2017 Trainees survive a Murder Mystery lunch together

The results of this program speak for themselves. Even this year, as we incorporated students who expressed less up-front interest in or exposure to technology, we continued to exceed our goals.

2015-2017

Preparing to Celebrate Women in Tech at GHC ’17

I’ve been in the technology industry for 17 years and consider my attendance at the Grace Hopper Celebration of Women in Computing as one of the most powerful experiences of my career—and one of the most fulfilling. I’m looking forward …

Serving Top Comments in Professional Social Networks

For the purposes of comment relevance, we needed a serving subsystem that could satisfy the following requirements:

  1. A system with an index that is able to retrieve all comments on a comment thread (quickly).

  2. Fast access to the list of

Query Analyzer: A Tool for Analyzing MySQL Queries Without Overhead

Benchmarking CPU utilization with various tools

 

Metrics Collection

For the initial version of Query Analyzer, we have used MySQL to handle the metrics (basically a time-series data). There are two tables: query_history and query_info. The query_history is where we save …

Getting to Know Shivam Sharma

What are some of the coolest projects that you and your team have been working on?
One of the coolest projects I have had the pleasure of working on is leveraging the Ember Data persistence library and building services on …

Common Issue Detection for CPU Profiling

The issues encountered are varied, but some common patterns have emerged:

Logging
Logging is very common in services, and is expected to be cheap. However, older logging frameworks, synchronized loggers, short-lived logger objects, and function evaluations during logging can all …

Open Sourcing Kafka Cruise Control

Anomaly detector
The anomaly detector identifies two types of anomalies:

  1. Broker failures: i.e., a non-empty broker leaves a cluster, which results in under-replicated partitions. Since this can happen during normal cluster bounces as well, the anomaly detector provides a configurable

External Library Management

At LinkedIn, when our engineers create software, there is often a need to leverage some of the great work done by the open source community outside of LinkedIn. In our continuous delivery parlance we refer to these assets as External …

An Update on the REACH Program

Co-authors: Shalini Agarwal, Joel Young, Ali Mohamed, and Yi Shen

Earlier this year, we kicked off our inaugural REACH program, which brings in software engineers from non-traditional tech backgrounds to apprentice at LinkedIn for six months. Our …

Creating Video Sharing on LinkedIn

Once the engineering leads came to a shared understanding of how the individual systems would need to interact with one another, the teams developed, prioritized, and scoped their own roadmap for the features and systems for which they were responsible. …

JARVIS: Helping LinkedIn Navigate its Source Code

Relevance

Relevance is a very important piece for any search system, and our codesearch is no exception. It is very important to show files at the top that users are most likely to open. Relevance for us involves assigning a …

Getting to Know Nikolai Avteniev

What other projects are you involved in outside of video sharing?
Outside of video sharing, I work on projects with the LinkedIn for Good organization. I’m part of a volunteer team of LinkedIn employees who help American veterans get access …

Scaling Contextual Conversation Suggestions Over 500 Million Members

We want to find all the paths of maximum length 3 from a starting node (a viewing member) to a destination node (company), and rank them based on the edges’ weights. Note that we only recommend the first degree connections. …

The TCP Tortoise: Optimizations for Emerging Markets

Serving fast pages is a core aspiration at LinkedIn. As part of this initiative, we continuously experiment and study the various layers of our stack and identify optimizations to ensure that we use the most optimal protocols and configurations …

InGraphs: Monitoring and Unexpected Artwork

At LinkedIn, we have an internal tool for visualizing operational metrics that we call inGraphs. Since I started working for LinkedIn almost four years ago, I’ve been snapshotting inGraphs that I thought were interesting—the ones that had helped to solve …

Migrating to Espresso

Babylonia makes direct writes to Espresso.

 

Ensuring consistency
We’ve had three different processes writing data to our Espresso database: the bulk loader, the Databus listener, and Babylonia itself. One issue we needed to tackle was how we would allow these …

Open Sourcing Jaqen, A Tool For Developing DNS Rebinding PoCs

Editor’s note: Members of the information security team at LinkedIn have an opportunity to work on research topics under a well-defined framework that allows them to evaluate new products and technologies, as well as explore the related threat surface. The

Powering Helix’s Auto Rebalancer with Topology-Aware Partition Placement

Partition assignments are critical to a typical distributed data system. A partition’s replicas could be in different states. For example, in the above graph, each partition has three replicas; one of them is the Primary replica, while the other two

LinkedIn Passes IPv6 Milestone

Earlier this month, and for the first time in our company’s history, more than 50% of pages on LinkedIn were accessed over IPv6 from mobile devices in the U.S. This is another step in the internet community’s long migration to …

Creating #DataScienceHappiness

In a previous post, I gave some advice for those who are interested in a career in data science. One of the suggestions I made was to find a work environment that values and promotes a good data science …

Building the Activity Graph, Part 2

An important feature of EFS is that an EFS instance never starts cold. We keep a copy of the keys stored in the in-memory cache in an instance of RocksDB which we call “CachedKeysStore.” Each time a record is added, …

Hiring SREs at LinkedIn

Before talking about the actual mechanics of the interview process, though, I want to talk a little bit about the theory behind interviewing and hiring. In any circumstance where you’re doing hiring, you have a specific need that you need …

Open Sourcing Iris and Oncall

Over the year and a half that we’ve been using Iris, we’ve onboarded many new services, and consequently have seen an explosion in the number of incidents it handles. However, with that growth comes the responsibility of guaranteeing reliable escalation …

Glimmer: Blazing Fast Rendering for Ember.js, Part 2

Since every mustache expression in our templates is backed by an object with the Dirtyable Tag on it, we know exactly when the UI needs to update. As we mentioned in the VM execution section, the initial render builds an …