1. Duplicate and difficult to manage authorization checks: Often multiple presentation services that provided access to the same underlying data had duplicate code for authorization checks. In some cases these checks became out of sync and difficult to manage.
  2. Fan out to multiple services: Most of these authorization checks required calling into other services. This was slow, the load was difficult to maintain, and it impacted overall performance and reliability.
Early authorization checks
Himeji-based authorization checks

To tackle these issues, we made two changes:

  1. We moved the authorization checks to data services, instead of performing authorization checks only in presentation services. This helped us alleviate duplicate and inconsistent check issues.
  2. We created Himeji, a centralized authorization system based on Zanzibar, which is called from the data layer. It stores permissions data and performs the checks as a central source of truth. Instead of fanning out at read time, we write all permissions data when resources are mutated. We fan out on the writes instead of the reads, given our read-heavy workload.
// Can the principal do relation on entity?
boolean check(entity, relation, principal)

A permissions check will look like the following, which states “can user 123 write to listing 10’s description?”:

check(entity: “LISTING : 10 : DESCRIPTION”, 
relation: WRITE,
principal: User(123))

This is interpreted by Himeji as the statement “is user 123 in the set of users that can write to listing 10’s description?”.

  • An entity is the triple (entity type : entity id : entity part); this comes from a natural language approach: LISTING : 10 : DESCRIPTION → “listing 10’s description”.
  • An entity id is the corresponding id in the data source of truth.
  • An entity type defines the data permissions apply to. Examples: LISTING, RESERVATION, USER
  • An entity part is an optional component. Examples: DESCRIPTION, PRICING, WIFI_INFO
  • A relation describes the relationship, like OWNER,READ, or WRITE but can be specific to use cases; some examples include HOST for the host of a reservation and DENY_VIEW for denying access to a listing.
  • A principal is either an authenticated user identity like User(123), or another entity like Reference(LISTING:15).

Based on the Zanzibar configuration, we use a YAML-based configuration language that allows for the resolution of permissions checks via set algebra, allowing a developer to map a check to a set operation:

LISTING:
'#WRITE':
union:
- '#WRITE'
- '#OWNER'
'#READ':
union:
- '#READ'
- '#WRITE'

Suppose user 123 is the owner of listing 10. Then the database will have the tuple LISTING : 10 # OWNER @ User(123).

When we request check(entity: "LISTING : 10", relation: WRITE, userId: 123), Himeji interprets LISTING # WRITE as the union of READ & WRITE, and transitively LISTING # WRITE as the union of WRITE & OWNER. Therefore, it will fetch the following from its database, with any matches belonging to the set of LISTING # WRITE:

Query LISTING : 10 # WRITE @ User(123) => Empty
Query LISTING : 10 # OWNER @ User(123) => Match User(123)

So for example, user 123 need only have LISTING : 10 # OWNER @ User(123) to be in the LISTING : 10 # WRITE set.

LISTING:
LOCATION:
'#READ':
union:
- #OWNER
- LISTING : $id # RESERVATION @
Reference(RESERVATION : $reservationId # GUEST)

Where this approach differs from Zanzibar is that such a tuple does not contain a relation (i.e. Reference(RESERVATION:$id # GUEST) ) within the principal. The relation following a referenced entity is static and retrieved from configuration. Taking the listing example and then checking against other use cases, we found that typically a reference will be followed to multiple relations. In our product, there is no variance in the set of relations used between two entity types; a change in the set means a product change and applies across all entity types. If the set of relations between two entity types (i.e. Reference(RESERVATION:$id # GUEST), Reference(RESERVATION:$id # COTRAVELLER), Reference(RESERVATION:$id # BOOKER), … ) has size M, writing a tuple for each of these leads to N*M tuples. By pulling the relation into configuration, we reduce the size of the stored data to N.

At read execution time, suppose the following tuples are stored in the database:

LISTING : 10 # OWNER @ User(123)
LISTING : 10 # RESERVATION @ Reference(RESERVATION : 500)
RESERVATION : 500 # GUEST @ User(456)

Now, if a client sends a request like:

check(LISTING : 10 : LOCATION # READ, User(456))

then based on the configuration, Himeji issues the first DB fetch based on the information from the request and the above config:

Query LISTING : 10 # RESERVATION => Match Reference(RESERVATION:500)
Query LISTING : 10 # OWNER @ User(456) => Empty

Himeji will then issue the 2nd DB fetch, substituting in the id of the reservation found, where a match indicates that the user 456 is in the set of users allowed to read listing 10’s location.

Query RESERVATION : 500 # GUEST @ User(456) => Match User(456)



Source link