Serendipitously, or unfortunately depending on the day, my work has required deep experience with “identity” in software; this includes authentication and authorization. Identity modeling and implementation shows tremendous variety across organizations. While some best-practices do exist, even specification-driven engineering still leaves a surprising amount of ambiguity and inconsistency between implementations (vendor and internal).

Authentication is verifying who/what an entity is; a user, automated service, IoT device, etc.

This user is _authenticated_ by Google by virtue of their Google account  
This service is _authenticated_ based on client credentials (e.g. `client_id` and `client_secret`)  

Authorization verifies that an entity should be permitted access to a Resource

The user is _authorized_ to access this endpoint because their JWT contains the `admin` role  
The service is _authorized_ to access this endpoint because it is in the `internal` network <<-- this is sub-optimal, but a common example  

My first toe into the waters of identity systems was at a small startup in the food/beverage space. At the time, they were using a homespun OAuth 1.0-inspired token exchange for web service/machine-to-machine flows. SAML and SSO were not yet on the radar for this company; user sessions were managed via the web framework used by engineering.

A while later, I was part of a team working configuring APIs via a 3rd-party Identity provider; we were just seeing microservice patterns gain adoption at enterprise scale. The system worked at a gateway level and became my first deep dive into OAuth2, OIDC, and SAML. As I moved through my career, identity kept coming up over and over again. I saw a terrible implementation at a startup, written in PHP. Essentially, a registration method set a user as active, then performed a database action, and unset-the activation on error… a crash or runtime error could result in an active user with no way to log in/re-register. Through happy circumstance, integrating systems has come up over and over (disclaimer: somehow or another, I have never had to work with LDAP or Active Directory); SSO and OIDC became familiar to me. I even designed a decoupled “roles and permissions” system for a large enterprise in the media industry. The POC informed a lot of my thinking about identity and access control. I spent a year administering internal tools for a large enterprise, where I learned about the challenges of managing cross-organizational access to shared systems. A surprising amount of toil keeps the wheels turning. One short-sighted leader asserted that it was “cheaper” to preserve the manual effort because of the time-to-payoff curve of automation (this was short-sighted because the assumption was a linear adoption curve, which was partially forced due to manual effort…).

Some of my deepest, and most interesting, identity work involved connected device certificates integrating via mutual TLS with a public key infrastructure (PKI) certificates. In the same role, I got to work with front-end bot control, as well as GDPR-compliant login systems. In all of these cases, no consistent identity model existed across the organization.

The big picture: your identity system should proactively enable the organization to know configure, monitor, audit, and analyze: “Who did what and why?”

We’re using x, so we’re good… right?

Many medium-large orgs seem to decide on an identity provider, then have their services call endpoints hosted by that provider and feel like the identity implementation for their system is complete: “We’re using OAuth2.” Only much later, when APIs are consuming APIs and crossing network/VPC boundaries across systems, does it become apparent how hard it is to track: who has access to what, for what purpose? Security audits come along, and it takes weeks of effort to catalog the various systems enforcing permissions and identity.

Why? For something as crucial to security and reliability, a cornerstone of trust for a company (encompassing end-users and machine/automation entities), why does identity present such a unique challenge? Why is it hard to get it right? Identity presents challenges at the social, organization, and technical levels. The concern cuts across software layers and networks, and yet the implementation of identity largely gets fractured among multiple teams in an organization. Let’s be real: investing in identity is unrewarding. The best identity implementations are nearly transparent to users and organization.

Most identity solutions implemented by organizations are fine-tuned and tailored to the situation that led to their creation/implementation. Many enterprise-level auth solutions still seem geared towards servicing individual use cases, never making the transition from “n of one” to “n of many.” While OAuth2 and OIDC make representing identity, and configuring user log ins for a web app, rather easy, the challenge of dealing with identity still presents a lot of open questions. How should permissions be represented? How can we ensure JWT permissions get applied similarly per-service? How do roles and permissions get represented? Most vendors in the identity space have have implementations that differ from the use-case of the organization. Some edge case takes the “vendor solution” to another factor which needs to be engineered around! (Note: Vendor solutions are super useful and can save thousands of engineering hours, _but you still have to know what to use for which purpose, and which use-cases are not handled by vendors).

Regardless of the vendors selected, identity design needs to occur at an organizational level, especially where multiple services are involved

A confounding source of complexity

“Identity” encompasses multiple concerns; this can be tricky because several of the concerns which should be managed by an identity system get adopted by other teams, usually in a tightly-scoped manner. For example, with GDPR and other privacy legislation initiatives either passing or in-the-works, a Data Privacy team often reviews architecture decisions and access patterns with multiple teams. Data access and authorization is often managed per-application; a database sometimes connects to multiple services which manage access independently! This can lead to a cross-system privilege escalation data access pattern. When dealing with internal and external access services, it’s common to see issues with “data chain-of-custody.” A user uses an SSO session to access a front end (say a Single Page Application [SPA] for this example). The SPA sends a JWT to a back-end API. At this point in the service flow, the original user who invoked the subsequent data request gets lost in a “chain of custody.”

Sure, this sort of issue can be engineered around by doing things like using “on-behalf-of” headers, but this is post-hoc handling. As the system grows and these sorts of edge cases increase, things start to get complex and convoluted. It gets harder and harder to reason about the system… then, if we want to have any observability – “Who can do what, and why?” – we need to write logging queries across multiple backend systems! This assumes that all traffic is following access patterns we anticipate. Anyone who has worked in any multi-service environment for long can tell you: This does not always hold true…

Unfortunately, these sorts of interdependent system issues are not something you can simply “build yourself out of;” the best-case scenarios are a patchwork of complex cross-system monitoring. This does not scale well, as every new service adds to the system complexity and this problem space grows. For organizations looking to capitalize on a connected ecosystem, a lack of cohesive, consistent identity modeling can impede these efforts.

By the way, does the organization have an inventory of all the actors, all of the actions, and all of the resources that might need to be protected at an organization?

Designing Identity

To build connected ecosystems, identity should be represented in a sane, consistent, and scalable model

Google and AWS have dedicated Identity and Access Management (IAM) services. This is because IAM is a cross-cutting concern. An identity system must encompass:

  • Entities (users, services, devices, etc.)
  • Actions (CRUD + domain-specific)
  • Resources (data, endpoints, files… anything that should be controlled)

The concepts of entities, actions, resources, etc. must be flexible! It does us no good to develop a system to manage the complexity of identity, only to discover that our system cannot represent some identities. An “IoT Device” is not a concept that we planned for when the many RFCs were written. A strong identity model should be extensible enough to encompass entity types and systems which don’t exist yet.

At the time of this writing, authorization SaaS providers are in a nascent space. Authentication/OAuth providers are commonplace; authorization between apps and across services is often inconsistent or non-existent. Some schemes do not have dynamic data loading, others assume a massive flat-file of all entities/actors, etc. These limitations may presently need to be engineered around, but it’s important to think of vendor-provided tools as building bricks for your identity system.

Subject, Entity, Actor: These are commonly-used terms referring to the agent which wants to perform an action in your service

Resource: anything that you want to protect

Action, Permission: Verbs; CRUD, but can also be specific to a domain (e.g. method-like names)

Ideally, any time a protected resource is accessed, an authorization check gets performed against an identity system. This requires that an identity service be:

  • reliable
  • performant
  • scalable

All of the systems which manage protected resources need a way to verify permissions with an identity system. To reduce latency, keeping an identity system “sidecar” in your local compute environments may be a helpful approach. This can be a challenge for systems which are not “cloud native” or which are not containerized. When designing a system or choosing vendor components, it’s important to have a general idea of your services and the access patterns they need to your identity system.

A comprehensive identity model needs to evaluate access patterns. This gives us a chance to implement a “zero trust” identity system!

“Zero trust” describes authenticating and authorizing requests regardless of their source; network access does not define permissions. This is a shift from the “castle and moat” approach to security, where a network perimeter is assumed to be secure and trusted. By designing your identity system with “zero trust” principals, you can keep your systems secure even if your networks are infiltrated.

By designing an identity system which encompasses the access patterns, entity types, actions, and resources your organization requires, we gain consistency and scalability while preserving security and reliability. The focus shifts from “How do we enforce authentication and authorization for this service?” to “What are our entities, actions, and resources for this service?” By focusing on how your services fit into a model, you save a lot of duplicated effort across your organization.

Role Based Access Control (RBAC): A model for authorization which uses roles to determine access. Roles are collections of permissions. Many systems start with RBAC but end up having to manage a proliferation of roles.

Attribute Based Access Control (ABAC): A model for authorization which uses attributes to determine access. Attributes may include metadata about the entity making the request or the resource being accessed. This enables applications to model access controls based on specific contexts

Relationship Based Access Control (ReBAC): A model for authorization which uses relationships over hierarchies/deep nesting. This enables deriving access based on existing relational data. ReBAC powerfully captures logic such as “The owner of this resource can perform these actions against resources organized beneath this resource.”

RBAC works well when there are defined actors and a discrete set of permissions in the system. ABAC sits decently well on top of RBAC. AWS uses ABAC in IAM/the Cedar policy engine, but also includes roles. ReBAC is used by Google and is reflected in GCP’s IAM system. While it’s useful to let your identity model be informed by public cloud implementations, it’s also important that you don’t solve problems you do not (and likely never will) have at your organization.

Before implementation, harden the identity model by gathering new use cases and attempt to model the use case in your system. This will help you identify use-cases that your model may not consider; it will also help with thinking about extensibility.

Observability

Authorized transactions should be logged and monitored. Period. Well-designed software should have no way to perform operations against protected data without logging. The observability of the identity solution should be as thought-out as your entity, action, and resource modeling.

Since your identity solution will likely be used by multiple applications, you’ll want to think about how you use your observability tools to answer the fundamental question, the reason for identity: “What is being performed, by whom, against which resources, and why?” This is the central question to your identity system.

If you can prove that your identity system is applied consistently across your ecosystem, this simplifies your security audits.

Analytics

Most log ingestion and analysis systems have a time-limit; e.g. many vendor-provided systems use a 90-day retention policy. When working with identity data, you have a powerful tool for system-wide analytics. If you’re billing per API call, you can use your identity system to assess which calls were performed. If you’re on-boarding a new customer and want to see how often they are using your services, your identity system will help you answer the question. Proactively ensuring that your identity system outputs reliable data, consistently, will answer questions that you do not know you have yet. There will be power in this data that enables your organization to make better decisions.

Conclusion

As an engineer who has worked in the identity space for a little while, across a handful of organizations, I have come to see identity as the entrypoint to a connected ecosystem. When no identity system exists, or the system is fractured and inconsistent, the organization incurs the complexity that comes from one-off integrations between each service; service-to-service calls can be brittle and difficult to reason about. As the number of entities and actors in a system increases, it takes longer and longer to securely build out new capacities.

On the other hand, a cohesive and consistent identity system can enable powerful, comprehensive benefits. The identity system becomes a shared protocol that all of your systems can reason about. The permissioning and entity language serves as a mini-domain-language for the actions of your services. The observability and visability you gain can be used to inform technical and product/financial decisions. You can even better understand the costs of your system when you are able to consistently track the entities, actions, and resources in your system.