Over the last couple of years, Airbnb engineering moved from a monolithic Ruby on Rails architecture to a service oriented architecture. In our Rails architecture, we had an API per resource to access the underlying data. These APIs had authorization checks to protect sensitive data. As there was a single way to access a resource’s data, managing these checks was easy. In the transition to SOA, we moved to a layered architecture where there are data services that wrap databases and presentation services hydrating from multiple data services. The initial approach to moving the permission checks from monolith to SOA was to move these checks to presentation services. However this led to several problems:
- Duplicate and difficult to manage authorization checks: Often multiple presentation services that provided access to the same underlying data had duplicate code for authorization checks. In some cases these checks became out of sync and difficult to manage.
- Fan out to multiple services: Most of these authorization checks required calling into other services. This was slow, the load was difficult to maintain, and it impacted overall performance and reliability.
To tackle these issues, we made two changes:
- We moved the authorization checks to data services, instead of performing authorization checks only in presentation services. This helped us alleviate duplicate and inconsistent check issues.
- We created Himeji, a centralized authorization system based on Zanzibar, which is called from the data layer. It stores permissions data and performs the checks as a central source of truth. Instead of fanning out at read time, we write all permissions data when resources are mutated. We fan out on the writes instead of the reads, given our read-heavy workload.
Himeji exposes a check API for data services to perform authorization checks. The API signature is as follows:
// Can the principal do relation on entity?
boolean check(entity, relation, principal)
A permissions check will look like the following, which states “can user 123 write to listing 10’s description?”:
check(entity: “LISTING : 10 : DESCRIPTION”,
This is interpreted by Himeji as the statement “is user 123 in the set of users that can write to listing 10’s description?”.
Similar to Zanzibar, the basic unit of storage for Himeji is a tuple in the form
entity # relation @ principal.
- An entity is the triple
(entity type : entity id : entity part); this comes from a natural language approach:
LISTING : 10 : DESCRIPTION→ “listing 10’s description”.
- An entity id is the corresponding id in the data source of truth.
- An entity type defines the data permissions apply to. Examples:
- An entity part is an optional component. Examples:
- A relation describes the relationship, like
WRITEbut can be specific to use cases; some examples include
HOSTfor the host of a reservation and
DENY_VIEWfor denying access to a listing.
- A principal is either an authenticated user identity like
User(123), or another entity like
If we had to write a tuple for each exact permission that is checked, the volume of data and denormalization would grow exponentially. For example, we’d have to write both
LISTING : 10 # WRITE @ User(123) and
LISTING : 10 # READ @ User(123) for the listing owner to be able to both read and write.
Based on the Zanzibar configuration, we use a YAML-based configuration language that allows for the resolution of permissions checks via set algebra, allowing a developer to map a check to a set operation:
Suppose user 123 is the owner of listing 10. Then the database will have the tuple
LISTING : 10 # OWNER @ User(123).
When we request
check(entity: "LISTING : 10", relation: WRITE, userId: 123), Himeji interprets
LISTING # WRITE as the union of
WRITE, and transitively
LISTING # WRITE as the union of
OWNER. Therefore, it will fetch the following from its database, with any matches belonging to the set of
LISTING # WRITE:
Query LISTING : 10 # WRITE @ User(123) => Empty
Query LISTING : 10 # OWNER @ User(123) => Match User(123)
So for example, user 123 need only have
LISTING : 10 # OWNER @ User(123) to be in the
LISTING : 10 # WRITE set.
We observed that entities at Airbnb frequently grant access to other entities as a result of their existence. For example, a guest of a reservation gains access to a listing’s location, along with other pieces of the listing’s information. We represent this use-case with a tuple where the principal is a reference to an entity, i.e.
LISTING : $id # RESERVATION @ Reference(RESERVATION : $reservationId). This allows us to express the concept that a user in the ‘guest’ set of a reservation that is in the ‘reservation’ set of a listing is in the
LISTING : LOCATION # READ set, minimizing the amount of data that needs to be stored:
- LISTING : $id # RESERVATION @
Reference(RESERVATION : $reservationId # GUEST)
Where this approach differs from Zanzibar is that such a tuple does not contain a relation (i.e.
Reference(RESERVATION:$id # GUEST) ) within the principal. The relation following a referenced entity is static and retrieved from configuration. Taking the listing example and then checking against other use cases, we found that typically a reference will be followed to multiple relations. In our product, there is no variance in the set of relations used between two entity types; a change in the set means a product change and applies across all entity types. If the set of relations between two entity types (i.e.
Reference(RESERVATION:$id # GUEST),
Reference(RESERVATION:$id # COTRAVELLER),
Reference(RESERVATION:$id # BOOKER), … ) has size
M, writing a tuple for each of these leads to
N*M tuples. By pulling the relation into configuration, we reduce the size of the stored data to
At read execution time, suppose the following tuples are stored in the database:
LISTING : 10 # OWNER @ User(123)
LISTING : 10 # RESERVATION @ Reference(RESERVATION : 500)
RESERVATION : 500 # GUEST @ User(456)
Now, if a client sends a request like:
check(LISTING : 10 : LOCATION # READ, User(456))
then based on the configuration, Himeji issues the first DB fetch based on the information from the request and the above config:
Query LISTING : 10 # RESERVATION => Match Reference(RESERVATION:500)
Query LISTING : 10 # OWNER @ User(456) => Empty
Himeji will then issue the 2nd DB fetch, substituting in the id of the reservation found, where a match indicates that the user 456 is in the set of users allowed to read listing 10’s location.
Query RESERVATION : 500 # GUEST @ User(456) => Match User(456)