Graphql In Action
Introduction To Graphql
This is still a work in progress. New content is synced here as it gets ready.

Introduction to GraphQL

This chapter covers

  • Understanding GraphQL and the design concepts behind it

  • How GraphQL differs from alternatives like REST APIs

  • Understanding the language used by GraphQL clients and services

  • Understanding the advantages and disadvantages of GraphQL

Necessity is the mother of invention. The product that inspired the creation of GraphQL was invented at Facebook because the company needed to solve many technical issues with its mobile application. However, I think GraphQL became so popular so fast not because it solves technical problems but rather because it solves communication problems.

Communication is hard. Improving our communication skills makes our lives better on many levels. Similarly, improving the communication between the different parts of a software application makes that application easier to understand, develop, maintain, and scale.

That’s why I think GraphQL is a game-changer. It changed the game of how the different "ends" of a software application (frontend and backend) communicate with each other. It gives them equal power, makes them independent of each other, decouples their communication process from its underlying technical transport channel, and introduces a rich new language in a place where the common previously spoken language was limited to a few words.

GraphQL powers many applications at Facebook today, including the main web application at facebook.com, the Facebook mobile application, and Instagram. Developers' interest in GraphQL is very clear, and GraphQL’s adoption is growing fast. Besides Facebook, GraphQL is used in many other major web and mobile applications like GitHub, Airbnb, Yelp, Pinterest, Twitter, The New York Times, Coursera, and Shopify. Given that GraphQL is a young technology, this is an impressive list.

In this first chapter, let’s learn what GraphQL is, what problems it solves, and what problems it introduces.

1. What is GraphQL?

The word graph in GraphQL comes from the fact that the best way to represent data in the real world is with a graph-like data structure. If you analyze any data model, big or small, you’ll always find it to be a graph of objects with many relations between them.

That was the first "Aha!" moment for me when I started learning about GraphQL. Why think of data in terms of resources (in URLs) or tables when you can think of it naturally as a graph?

Note that the graph in GraphQL does not mean that GraphQL can only be used with a "graph database." You can have a document database (like MongoDB) or a relational database (like PostgreSQL) and use GraphQL to represent your API data in a graph-like structure.

The QL in GraphQL might be a bit confusing, though. Yes, GraphQL is a query language for data APIs, but that’s only from the perspective of the frontend consumer of those data APIs. GraphQL is also a runtime layer that needs to be implemented on the backend, and that layer is what makes the frontend consumer able to use the new language.

The GraphQL language is designed to be declarative, flexible, and efficient. Developers of data API consumers (like mobile and web applications) can use that language to request the data they need in a language close to how they think about data in their heads instead of a language related to how the data is stored or how data relations are implemented.

On the backend, a GraphQL-based stack needs a runtime. That runtime provides a structure for servers to describe the data to be exposed in their APIs. This structure is what we call a schema in the GraphQL world. An API consumer can then use the GraphQL language to construct a text request representing their exact data needs. The client sends that text request to the API service through a transport channel (for example, HTTP). The GraphQL runtime layer accepts the text request, communicates with other services in the backend stack to put together a suitable data response, and then sends that data back to the consumer in a format like JSON. Figure 1.1 summarizes the dynamics of this communication.

ch01 fig 01 gqlia
Figure 1. 1. GraphQL is a language and a runtime
Using GraphQL with other libraries

GraphQL is not specific to any backend or frontend framework, technical stack, or database. It can be used in any frontend environment, on any backend platform, and with any database engine. You can use it on any transport channel and make it use any data representation format.

In frontend web or mobile applications, you can use GraphQL by making direct Ajax calls to a GraphQL server or with a client like Apollo or Relay (which will make the Ajax request on your behalf). You can use a library like React (or React Native) to manage how your views use the data coming from a GraphQL service, but you can also do that with APIs native to their UI environments (like the DOM API or native iOS components).

Although you do not need React, Apollo, or Relay to use GraphQL in your applications, these libraries add more value to how you can use GraphQL APIs without having to do complex data management tasks.

1.1. The big picture

In general, an API is an interface that enables communication between multiple components in an application. For example, an API can enable the communication that needs to happen between a web client and a database server. The client tells the server what data it needs, and the server fulfills the client’s requirement with objects representing the data the client asked for (figure 1.2).

ch01 fig 02 gqlia
Figure 1. 2. The big picture of data APIs

There are different types of APIs, and every big application needs them. For GraphQL, we are specifically talking about the API type used to read and modify data, which is usually referred to as a data API.

GraphQL is one option out of many that can be used to provide applications with programmable interfaces to read and modify the data the applications need from data services. Other options include REST, SOAP, XML, and even SQL itself.

SQL (Structured Query Language) might be directly compared to GraphQL because QL is in both names, after all. Both SQL and GraphQL provide a language to query data schemas. They can both be used to read and modify data. For example, if we have a table of data about a company’s employees, the following is an example SQL statement to read data about the employees in one department.

Listing 1. 1. SQL statement for querying
SELECT id, first_name, last_name, email, birth_date, hire_date
FROM employees
WHERE department = 'ENGINEERING'

Here is another example SQL statement that inserts data for a new employee.

Listing 1. 2. SQL statement for mutating
INSERT INTO employees (first_name, last_name, email, birth_date, hire_date)
VALUES ('Jane', 'Doe', '[email protected]', '01/01/1990', '01/01/2020')

You can use SQL to communicate data operations as we did in listings 1.1 and 1.2. The database servers to which these SQL statements are sent may support different formats for their responses. Each SQL operation type has a different response. A SELECT operation might return a single row or multiple rows. An INSERT operation might return just a confirmation, the inserted rows, or an error response.

Although SQL could be used directly by mobile and web applications to communicate data requirements, it would not be a good language for that purpose. SQL is simply too powerful and too flexible, and it would introduce many challenges. For example, exposing your exact database structure publicly would be a significant security problem. You can put SQL behind another service layer, but that means you need to create a parser and analyzer to perform operations on users' SQL queries before sending them to the database. That parser/analyzer is something you get out of the box with any GraphQL server implementation.

While most relational databases directly support SQL, GraphQL is its own thing. GraphQL needs a runtime service. You cannot just start querying databases using the GraphQL query language (at least, not yet). You need to use a service layer that supports GraphQL or implement one yourself.

Some databases allow their clients to use GraphQL to query them directly. An example is Dgraph (az.dev/dgraph).

JSON is a language that can be used to communicate data. Here is a JSON object that can represent Jane’s data:

Listing 1. 3. JSON object representing data
{
  "data": {
    "employee":{
      "id": 42,
      "name": "Jane Doe",
      "email": "[email protected]",
      "birthDate": "01/01/1990",
      "hireDate": "01/01/2020"
    }
  }
}
The data communicated about Jane does not have to use the same structure as how it is saved in the database. I used camel-case property names, and I combined first_name and last_name into one name field.

JSON is a popular language for communicating data from API servers to client applications. Most of the modern data API servers use JSON to fulfill the data requirements of client applications. GraphQL servers are no exception; JSON is the popular choice to fulfill the requirements of GraphQL data requests.

JSON can also be used by client applications to communicate their data requirements to API servers. For example, here is a possible JSON object that communicates the data requirements for the employee object response in listing 1.3.

Listing 1. 4. JSON example for querying
{
  "select": {
    "fields": ["name", "email", "birthDate", "hireDate"],
    "from": "employees",
    "where": {
      "id": {
       "equals": 42
      }
    }
  }
}

GraphQL for client applications is another language they can use to express their data requirements. The following is how the previous data requirement can be expressed with a GraphQL query.

Listing 1. 5. GraphQL example for querying
{
  employee(id: 42) {
    name
    email
    birthDate
    hireDate
  }
}

The GraphQL query in listing 1.5 represents the same data need as the JSON object in listing 1.4, but as you can see, it has a different and shorter syntax. A GraphQL server can understand this syntax and translate it into something the data storage engine can understand (for example, the GraphQL server might translate the query into SQL statements for a relational database). Then, the GraphQL server can take what the storage engine responds with, translate it into something like JSON or XML, and send it back to the client application.

This is nice because no matter what storage engine(s) you have to deal with, with GraphQL, you make API servers and client applications both work with a universal language for requests and a universal language for responses.

In a nutshell, GraphQL is all about optimizing data communication between a client and a server. This includes the client asking for needed data and communicating that need to the server, and the server preparing a fulfillment for that need and communicating the fulfillment back to the client. GraphQL allows clients to ask for the exact data they need and makes it easier for servers to aggregate data from multiple data storage resources.

At the core of GraphQL is a strong type system that is used to describe data and organize APIs. This type system gives GraphQL many advantages on both the server and client sides. Types ensure that clients ask for only what is possible and provide clear and helpful errors. Clients can use types to minimize any manual parsing of data elements. The GraphQL type system allows for rich features like having an introspective API and being able to build powerful tools for both clients and servers. One of the popular GraphQL tools that relies on this concept is GraphiQL, a feature-rich browser-based editor to explore and test GraphQL requests. You will learn about GraphiQL in the next chapter.

1.2. GraphQL is a specification

Although Facebook engineers started working on GraphQL in 2012, it was 2015 when they released a public specifications document. You can see the current version of this document by navigating to az.dev/graphql-spec; it is maintained by a community of companies and individuals on GitHub. GraphQL is an evolving language, but the specifications document was a genius start for the project because it defined standard rules and practices that all implementers of GraphQL runtimes must adhere to. There have been many implementations of GraphQL libraries in many different programming languages, and all of them closely follow the specification document and update their implementations when that document is updated. If you work on a GraphQL project in Ruby and later switch to another project in Scala, the syntax will change, but the rules and practices will remain the same.

You can ultimately learn everything about the GraphQL language and runtime requirements in the official specification document. It is a bit technical, but you can still learn a lot from it by reading its introductory parts and examples. This book will not cover everything in the document, so I recommend that you skim through it once you are finished reading the book.

The specification document starts by describing the syntax of the GraphQL language. Let’s talk about that first.

GraphQL server libraries

Alongside the specification document, Facebook also released a reference implementation library for GraphQL runtimes in JavaScript. JavaScript is the most popular programming language and the one closest to mobile and web applications, which are two of the popular channels where using GraphQL can make a big difference. The reference JavaScript implementation of GraphQL is hosted at github.com/graphql/graphql-js, and it’s the one we use in this book. I’ll refer to this implementation as GraphQL.js.

To see a list of other GraphQL server libraries, check out az.dev/graphql-servers.

1.3. GraphQL is a language

While the Q (for query) is right there in the name, querying is associated with reading, but GraphQL can be used for both reading and modifying data. When you need to read data with GraphQL, you use queries; and when you need to modify data, you use mutations. Both queries and mutations are part of the GraphQL language.

GraphQL operations

Queries represent READ operations. Mutations represent WRITE-then-READ operations. You can think of mutations as queries that have side effects.

In addition to queries and mutations, GraphQL also supports a third request type called a subscription, used for real-time data monitoring requests. Subscriptions represent continuous READ operations. Mutations usually trigger events for subscriptions.

GraphQL subscriptions require the use of a data-transport channel that supports continuous pushing of data. That’s usually done with Web Sockets for web applications.

GraphQL operations are similar to how we use SQL SELECT statements to read data and INSERT, UPDATE, and DELETE statements to modify data. The SQL language has certain rules we must follow. For example, a SELECT statement requires a FROM clause and can optionally have a WHERE clause. Similarly, the GraphQL language has certain rules to follow. For example, a GraphQL query must have a name or be the only query in a request. You will learn about the rules of the GraphQL language in the next few chapters.

A query language like GraphQL (or SQL) is different from programming languages like JavaScript or Python. You cannot use the GraphQL language to create user interfaces or perform complex computations. Query languages have more specific use cases, and they often require the use of other programming languages to make them work. Nevertheless, I would like you to first think of the query language concept by comparing it to programming languages and even spoken languages like English. This is a very limited comparison, but I think it will help you understand and appreciate a few things about GraphQL.

In general, the evolution of programming languages is making them closer and closer to spoken human languages. Computers used to only understand imperative instructions, and that is why we have been using imperative paradigms to program them. However, computers today are starting to understand declarative paradigms, and we can program them to understand wishes. Declarative programming has many advantages (and disadvantages), but what makes it such a good idea is that we always prefer to reason about problems in declarative ways. Declarative thinking is easy for humans.

We can use the English language to declaratively communicate data needs and fulfillments. For example, imagine that John is the client and Jane is the server. Here is an English data communication session:

John: “Hey Jane, how long does it take sunlight to reach planet Earth?”

Jane: “A bit over 8 minutes.”

John: “How about the light from the moon?”

Jane: “A bit under 2 seconds.”

John can also easily ask both questions in one sentence, and Jane can easily answer them both by adding more words to her answer.

When we communicate using the English language, we understand special expressions like "a bit over" and "a bit under." Jane also understands that the incomplete second question is related to the first one. Computers, on the other hand, are not very good (yet) at understanding things from the context. They need more structure.

GraphQL is just another declarative language that John and Jane can use for their data communication session. It is not as good as the English language, but it is a structured language that computers can easily parse and use. For example, here’s a hypothetical single GraphQL query that represents both of John’s questions to Jane.

Listing 1. 6. John’s questions to Jane in GraphQL
{
  timeLightNeedsToTravel(toPlanet: "Earth") {
    fromTheSun: from(star: "Sun")
    fromTheMoon: from(moon: "Moon")
  }
}

The example GraphQL request in listing 1.6 uses a few of the GraphQL language parts like fields (timeLightNeedsToTravel and from), parameters (toPlanet, star, and moon), and aliases (fromTheSun and fromTheMoon). These are like verbs and nouns in English. You will learn about all the syntax parts that you can use in GraphQL requests in chapters 2 and 3.

1.4. GraphQL is a service

If we teach a client application to speak the GraphQL language, it will be able to communicate any data requirements to a backend data service that also speaks GraphQL. To teach a data service to speak GraphQL, you implement a runtime layer and expose that layer to the clients that want to communicate with the service. Think of this layer on the server side as simply a translator of the GraphQL language, or a GraphQL-speaking agent that represents the data service. GraphQL is not a storage engine, so it cannot be a solution on its own. This is why you cannot have a server that speaks just GraphQL; you need to implement a translating runtime layer.

A GraphQL service can be written in any programming language, and it can be conceptually split into two major parts: structure and behavior:

  • The structure is defined with a strongly typed schema. A GraphQL schema is like a catalog of all the operations a GraphQL API can handle. It simply represents the capabilities of an API. GraphQL client applications use the schema to know what questions they can ask the service. The typed nature of the schema is a core concept in GraphQL. The schema is basically a graph of fields that have _types; this graph represents all the possible data objects that can be read (or updated) through the service.

  • The behavior is naturally implemented with functions that in the GraphQL world are called resolver functions. They represent most of the smart logic behind GraphQL’s power and flexibility. Each field in a GraphQL schema is backed by a resolver function. A resolver function defines what data to fetch for its field.

    A resolver function represents the instructions on how and where to access raw data. For example, a resolver function might issue a SQL statement to a relational database, read a file’s data directly from the operating system, or update some cached data in a document database. A resolver function is directly related to a field in a GraphQL request, and it can represent a single primitive value, an object, or a list of values or objects.

The GraphQL restaurant analogy

A GraphQL schema is often compared to a restaurant menu. In that analogy, the wait staff act like instances of the GraphQL API interface. No wonder we use the term server!

Table servers take your orders back to the kitchen, which is the core of the API service. You can compare items on the menu to fields in the GraphQL language. If you order a steak, you need to tell your server how you would like it cooked. That’s where you can use field arguments:

order {
  steak(doneness: MEDIUMWELL)
}

Let’s say this restaurant is very busy and hired a chef with the sole responsibility of cooking steaks. This chef is the resolver function for the steak field!

Resolver functions are why GraphQL is often compared to the remote procedure call (RPC) distributed computing concept. GraphQL is essentially a way for clients to invoke remote — resolver — functions.

1.4.1. An example of a schema and resolvers

To understand how resolvers work, let’s take the query in listing 1.5 (simplified) and assume a client sent it to a GraphQL service.

Listing 1. 7. Simplified example query text
query {
  employee(id: 42) {
    name
    email
  }
}

The service can receive and parse any request. It then tries to validate the request against its schema. The schema has to support an employee field, and that field has to represent an object with an id argument, a name field, and an email field. Fields and arguments must have types in GraphQL. The id argument is an integer. The name and email fields are strings. The employee field is a custom type (representing that exact id/name/email structure).

Just like the client-side query language, the GraphQL community standardized a server-side language dedicated to creating GraphQL schema objects. This language is known as the schema language. It’s often abbreviated SDL (schema definition language) or IDL (interface definition language).

Here’s an example to represent the Employee type using GraphQL’s schema language.

Listing 1. 8. GraphQL schema language example
type Employee(id: Int!) {
  name: String!
  email: String!
}

This custom Employee type represents the structure of an employee "model." An object of the employee model can be looked up with an integer id, and it has name and email string fields.

The exclamation points after the types mean they cannot be empty. A client cannot ask for an employee field without specifying an id argument, and a valid server response to this field must include a name string and an email string.
The schema language type definitions are like the database CREATE statements used to define tables (and other database schema elements).

Using this type, the GraphQL service can conclude that the GraphQL query in listing 1.7 is valid because it matches the supported type structure. The next step is to prepare the data it is asking for. To do that, the GraphQL service traverses the tree of fields in that request and invokes the resolver function associated with each field. It then gathers the data returned by these resolver functions and uses it to form a single response.

This example GraphQL service needs at least three resolver functions: one for the employee field, one for the name field, and one for the email field.

The employee field’s resolver function might, for example, do a query like SELECT * FROM employees WHERE id = 42. This SQL statement returns all columns available on the employees table. Let’s say the employees table happens to have the following fields: id, first_name, last_name, email, birth_date, and hire_date.

Then the employee field’s resolver function for employee #42 might return an object like the following.

Listing 1. 9. Response from the database for employee #42
{
  "id": 42,
  "first_name": "Jane",
  "last_name": "Doe",
  "email": "[email protected]",
  "birth_date": "01/01/1990",
  "hire_date": "01/01/2020"
}

The GraphQL service continues to traverse the fields in the tree one by one, invoking the resolver function for each field. Each resolver function is passed the result of executing the resolver function of its parent node. So both the name and email resolver functions receive the object in listing 1.9 (as their first argument).

Let’s say we have the following (JavaScript) functions representing the server resolver functions for the name and email fields:

// Resolver functions
const name => (source) => `${source.first_name} ${source.last_name}`;
const email => (source) => source.email;

Here, the source object is the parent node.

The email resolver function is known as a "trivial" resolver because the email field name matches the email property name on the parent source object. Some GraphQL implementations (for example, the JavaScript implementation) have built-in trivial resolvers and use them as default resolvers if no resolver is found for a field.

The GraphQL service uses all the responses of these three resolver functions to put together the following single response for the query in listing 1.7.

Listing 1. 10. Example GraphQL response object
{
  data: {
    employee: {
      name: 'Jane Doe',
      email: '[email protected]'
    }
  }
}

We’ll start to explore how to write custom resolvers in chapter 5.

GraphQL does not require any specific data serialization format, but JSON is the most popular one. All the examples in this book use the JSON format.

2. Why GraphQL?

GraphQL is not the only — or even the first — technology to encourage creating efficient data APIs. You can use a JSON-based API with a custom query language or implement the Open Data Protocol (OData) on top of a REST API. Experienced backend developers have been creating efficient technologies for data APIs since long before GraphQL. So why do we need a new technology?

If you asked me to answer the "Why GraphQL?" question with a single word, that word would be standards.

GraphQL provides comprehensive standards and structures to implement API features in maintainable and scalable ways.

GraphQL makes it mandatory for data API servers to publish documentation (the schema) about their capabilities. That schema enables client applications to know everything available for them on these servers. The GraphQL standard schema has to be part of every GraphQL API. Clients can ask the service about its schema using the GraphQL language. We’ll see examples in chapter 3.

Other solutions can be made better by adding similar documentation. The unique thing about GraphQL here is that the documentation is part of how you create the API service. You cannot have out-of-date documentation. You cannot forget to document a use case. You cannot offer different ways to use APIs, because you have standards to work with. Most important, you do not need to maintain the documentation of your API separately from that API. GraphQL documentation is built-in, and it’s first class.

The mandatory GraphQL schema represents the possibilities and the limits of what can be answered by the GraphQL service. But there is some flexibility in how to use the schema because we are talking about a graph of nodes, and graphs can be traversed using many paths. This flexibility is one of the great benefits of GraphQL because it allows backend and frontend developers to make progress in their projects without needing to constantly coordinate that progress with each other. It basically decouples clients from servers and allows both of them to evolve and scale independently. This enables much faster iteration in both frontend and backend products.

I think this standard schema is among the top benefits of GraphQL — but let’s talk about the technological benefits of GraphQL as well.

One of the most significant – and perhaps most popular — technological reasons to consider a GraphQL layer between clients and servers is efficiency. API clients often need to ask the server about multiple resources, and the API server usually knows how to answer questions about a single resource. As a result, the client ends up having to communicate with the server multiple times to gather all the data it needs (figure 1.3).

ch01 fig 04 gqlia
Figure 1. 3. A client asking a server about multiple resources

With GraphQL, you can basically shift this multi-request complexity to the backend and have your GraphQL runtime deal with it. The client asks the GraphQL service a single question and gets a single response with precisely what the client needs (figure 1.4). You can customize a REST-based API to provide one exact endpoint per view, but that’s not the norm. You will have to implement it without a standard guide.

ch01 fig 05 gqlia
Figure 1. 4. GraphQL shifts multi-request complexities to the back-end side

Another big technological benefit of GraphQL is communicating with multiple services. When you have multiple clients requesting data from multiple data storage services (like PostgreSQL, MongoDB, and a Redis cache), a GraphQL layer in the middle can simplify and standardize this communication. Instead of a client going directly to the multiple data services, you can have that client communicate with the GraphQL service. Then the GraphQL service communicates with the different data services (figure 1.5). This is how GraphQL isolates clients from needing to communicate in multiple languages. A GraphQL service translates a single client’s request into multiple requests to multiple services using different languages.

ch01 fig 06 gqlia
Figure 1. 5. GraphQL can communicate with different data services.
GraphQL is a translator

Imagine three people who speak three different languages and have different types of knowledge. Then imagine that you have a question that can only be answered by combining the knowledge of all three people. If you have a translator who speaks all three languages, the task of putting together an answer to your question becomes easy. That is what a GraphQL service can do for clients. This point is valid with other data API options, but GraphQL provides standard structures that enable implementing this kind of data need in easier and more maintainable ways.

One other benefit of GraphQL that I think is often underrated is how it improves the frontend developer experience. The GraphQL schema gives frontend developers a lot of power and control to explore, construct, validate, test, and accurately perform data-need communication without depending on backend developers. It eliminates the need for the server to hardcode the shape or size of the data, and it decouples clients from servers. This means clients and servers can be developed and maintained separately from each other, which is a significant benefit on its own.

More important, with GraphQL, developers express their UI data requirements using a declarative language. They express what they need, not how to make it available. There is a tight relationship between what data a UI needs and the way a developer can describe that data need in GraphQL.

2.1. What about REST APIs?

GraphQL APIs are often compared to REST APIs because the latter have been the most popular choice for data APIs demanded by web and mobile applications. GraphQL provides a more efficient technological alternative to REST APIS. But why do we need an alternative? What is wrong with REST APIs?

The biggest relevant problem with REST APIs is the client’s need to communicate with multiple data API endpoints. REST APIs are an example of servers that require clients to do multiple network round trips to get data. A REST API is a collection of endpoints where each endpoint represents a resource. So, when a client needs data about multiple resources, it has to perform multiple network requests to that REST API and then put together the data by combining the multiple responses it receives. This is a significant problem, especially for mobile applications, because mobile devices usually have processing, memory, and network constraints.

Furthermore, in a REST API, there is no client request language. Clients do not have control over what data the server will return because they do not have a language to communicate their exact needs. More accurately, the language available for clients of a REST API is very limited. For example, the READ REST API endpoints are either GET /ResourceName, to get a list of all the records for that resource, or GET /ResourceName/ResourceID to get a single record identified by an ID.

In a pure REST API (not a customized one), a client cannot specify which fields to select for a record in that resource. That information is in the REST API service itself, and the REST API service always returns all the fields regardless of which ones the client actually needs. GraphQL’s term for this problem is over-fetching of information that is not needed. It is a waste of network and memory resources for both the client and the server.

One other big problem with REST APIs is versioning. If you need to support multiple versions, that usually means new endpoints. This leads to more problems while using and maintaining these endpoints, and it might be the cause of code duplication on the server.

The REST API problems mentioned here are specific to what GraphQL is trying to solve. They are certainly not all of the problems with REST APIs.

REST APIs eventually turn into a mix of regular REST endpoints plus custom ad hoc endpoints crafted for performance reasons. This is where GraphQL offers a much better alternative.

It is important to point out here that REST APIs have some advantages over GraphQL APIs. For example, caching a REST API response is easier than caching a GraphQL API response, as you will see in the last section of this chapter. Also, optimizing the code for different REST endpoints is potentially easier than optimizing the code for a single generic endpoint. There is no single magical solution that fixes all issues without introducing new challenges. REST APIs have their place, and when used correctly, both GraphQL and REST have great applications. Also, nothing prohibits using them together in the same system.

REST-ish APIs

Please note that in this book, I am talking about pure REST APIs. Some of the problems mentioned here and solved by GraphQL can also be solved by customizing REST APIs. For example, you can modify the REST API to accept an include query string that accepts a comma-separated list of fields to return in the response. This will avoid the over-fetching problem. You can also make a REST API include sub-resources with some query flags. There are tools out there that you can add on top of REST-based systems, and they can enable such customizations or make those systems easier to implement.

Such approaches might be okay on a small scale, and I have personally used them with some success. However, compared to what GraphQL offers, these approaches require a lot of work and cause slower iterations in projects. They are also not standardized and do not scale well for big projects.

2.2. The GraphQL way

To see the GraphQL way of solving the REST API problems we have talked about, you need to understand the concepts and design decisions behind GraphQL. Let’s look at the major ones.

2.2.1. The typed Graph schema

To create a GraphQL API, you need a typed schema. A GraphQL schema contains fields that have types. Those types can be primitive or custom. Everything in the GraphQL schema requires a type. This static type system is what makes a GraphQL service predictable and discoverable.

2.2.2. The declarative language

GraphQL has a declarative nature for expressing data requirements. It provides clients with a declarative language for expressing their data needs. This declarative nature enables a thinking model in the GraphQL language that is close to the way we think about data requirements in English, and it makes working with a GraphQL API a lot easier than the alternatives.

2.2.3. The single endpoint and client language

To solve the multiple round-trip problem, GraphQL makes the responding server work as a single endpoint. Basically, GraphQL takes the custom endpoint idea to an extreme and makes the whole server a single smart endpoint that can reply to all data requests.

The other significant concept that goes with the single smart endpoint is the rich client request language needed to work with that single endpoint. Without a client request language, a single endpoint is useless. It needs a language to process a custom request and respond with data for that custom request.

Having a client request language means clients are in control. They can ask for exactly what they need, and the server will reply with exactly what they ask for. This solves the problem of over-fetching data that is not needed.

Furthermore, having clients ask for exactly what they need enables backend developers to generate more useful analytics about what data is being used and what parts of the data are in higher demand. This is very useful information. For example, it can be used to scale and optimize data services based on usage patterns. It can also be used to detect abnormalities and client version changes.

2.2.4. The simple versioning

When it comes to versioning, GraphQL has an interesting take. Versioning can be avoided altogether. Basically, you can add new fields and types without removing the old ones because you have a graph and can flexibly grow it by adding more nodes. You can leave paths on the graph for old APIs and introduce new ones. The API just grows, and no new endpoints are needed. Clients can continue to use older features, and they can also incrementally update their code to use new features.

Using a single evolving version, GraphQL APIs give clients continuous access to new features and encourage cleaner, more maintainable server code.

This is especially important for mobile clients because you cannot control the version of the API they are using. Once installed, a mobile app might continue to use that same old version of the API for years. On the web, it is easy to control the API version because you can just push new code and force all users to use it. For mobile apps, this is a lot harder to do.

This simple versioning approach has some challenges. Keeping old nodes forever introduces downsides. More maintenance effort is required to make sure old nodes still work as they should. Furthermore, users of the APIs might be confused about which fields are old and which are new. GraphQL offers a way to deprecate (and hide) older nodes so that readers of the schema only see the new ones. Once a field is deprecated, the maintainability problem becomes a question of how long old users continue to use it. The great thing here is that as a maintainer, you can confidently answer the questions "Is a field still being used?" and "How often is a field being used?" thanks to the client query language. The removal of unused, deprecated fields can even be automated.

2.3. REST APIs and GraphQL APIs in action

Let’s go over a one-to-one comparison example between a REST API and a GraphQL API. Imagine that you are building an application to represent the Star Wars films and characters. The first UI you tackle is a view to show information about a single Star Wars character. This view should display the character’s name, birth year, the name of their planet, and the titles of all the films in which they appeared. For example, for Darth Vader, along with his name, the view should display his birth year (41.9BBY), his planet name (Tatooine), and the titles of the four Star Wars films in which he appeared (A New Hope, The Empire Strikes Back, Return of the Jedi, and Revenge of the Sith).

As simple as this view sounds, you are actually dealing with three different resources: Person, Planet, and Film. The relationship between these resources is simple. We can easily guess the shape of the data needed. A person object has exactly one planet object and one or more films objects.

The JSON data for this view could be something like the following.

Listing 1. 11. JSON data example object for a UI component
{
  "data": {
    "person": {
      "name": "Darth Vader",
      "birthYear": "41.9BBY",
      "planet": {
        "name": "Tatooine"
      },
      "films": [
        { "title": "A New Hope" },
        { "title": "The Empire Strikes Back" },
        { "title": "Return of the Jedi" },
        { "title": "Revenge of the Sith" }
      ]
    }
  }
}

Assuming that a data service can give us this exact structure, here is one possible way to represent its view with a frontend component library like React.js.

Listing 1. 12. UI view example in React.js
// The Container Component:
<PersonProfile person={data.person}></PersonProfile>

// The PersonProfile Component:
Name: {data.person.name}
Birth Year: {data.person.birthYear}
Planet: {data.person.planet.name}
Films: {data.person.films.map(film => film.title)}

This is a very simple example. Our experience with Star Wars helped us design the shape of the needed data and figure out how to use it in the UI.

Note one important thing about the UI view in listing 1.12: its relationship with the JSON data object in listing 1.11 is very clear. The UI view used all the "keys" from the JSON data object. See the values in curly brackets in listing 1.12.

Now, how can you ask a REST API service for the data in listing 1.11?

You need a single character’s information. Assuming that you know that character’s ID, a REST API is expected to expose that information with an endpoint like this:

GET - /people/{id}

This request will give you the name, birthYear, and other information about the character. A REST API will also give you access to the ID of this character’s planet and an array of IDs for all the films they appeared in.

The JSON response for this request could be something like the following:

{
  "name": "Darth Vader",
  "birthYear": "41.9BBY",
  "planetId": 1
  "filmIds": [1, 2, 3, 6],
   ·-·-·   (1)
}
1 Other information that is not needed for this view
Throughout this book, I use ·-·-· in code listings to indicate omitted content. This is to distinguish it from the three-dots syntax (...), which is part of both JavaScript and GraphQL (see az.dev/js-intro).

Then, to read the planet’s name, you ask

GET - /planets/1

And to read the film titles, you ask

GET - /films/1
GET - /films/2
GET - /films/3
GET - /films/6

Once you have all six responses from the server, you can combine them to satisfy the view’s data need.

Besides the fact that you had to do six network round trips to satisfy a simple data need for a simple UI, the whole approach here is imperative. You give instructions on how to fetch the data and how to process it to make it ready for the view. For example, you have to deal with the planet and film IDs, although the view does not really need them. You have to manually combine multiple data objects, although you are implementing a single view that naturally needs just a single data object.

Try asking for this data from a REST API yourself. The Star Wars data has an excellent REST API called SWAPI, which you can find at https://az.dev/swapi. Construct the same data object there. The names of the data elements might be a bit different, but the endpoint structure is the same. You will need to do exactly six API calls. Furthermore, you will have to over-fetch information that the view does not need.

Of course, SWAPI is just one pure implementation of a REST API for this data. There could be better custom implementations that make this view’s data needs easier to fulfill. For example, if the API server implemented nested resources and understood the relationship between a person and a film, you could read the film data (along with the character data) with something like this:

GET - /people/{id}/films

However, a pure REST API would not have that out of the box. You would need to ask the backend engineers to create this custom endpoint for your view. This is the reality of scaling a REST API: you add custom endpoints to efficiently satisfy clients' growing needs. Managing custom endpoints like these is hard.

For example, if you customized your REST API endpoint to return the film data for a character, that would work great for the view you are currently implementing. However, in the future, you might need to implement a shorter or longer version of the character’s profile information. Maybe you will need to show only one of their films or display the film description in addition to the title. Every new requirement will mean a change must be made to customize the endpoint further or come up with new endpoints to optimize the communication needed for the new views. This approach is simply limited.

Let’s now look at the GraphQL approach.

A GraphQL server is a single smart endpoint. The transport channel does not matter. If you are doing this over HTTP, the HTTP method certainly does not matter either. Let’s assume that you have a single GraphQL endpoint exposed over HTTP at /graphql.

Since you want to ask for data in a single network round trip, you must have a way to express the complete data needs for the server to parse. You do this with a GraphQL query:

GET or POST - /graphql?query={·-·-·}

A GraphQL query is just a string, but it must include all the pieces of the data that you need. This is where the declarative power comes in.

Let’s compare how this simple view’s data requirement can be expressed with English and with GraphQL.

Table 1. 1. How GraphQL is close to English
In English In GraphQL

The view needs:

a person’s name,

birth year,

planet’s name,

and the titles of all their films.

{
  person(ID: ·-·-·) {
    name
    birthYear
    planet {
      name
    }
    films {
      title
    }
  }
}

Can you see how close the GraphQL expression is to the English version? It is as close as it can get. Furthermore, compare the GraphQL query with the original JSON data object that we started with.

Table 1. 2. The similar structure between a GraphQL query and its response
GraphQL query (question) Needed JSON (answer)
{
  person(ID: ·-·-·) {
    name
    birthYear
    planet {
      name
    }
    films {
     title
    }
  }
}
{
  "data": {
    "person": {
      "name": "Darth Vader",
      "birthYear": "41.9BBY",
      "planet": {
        "name": "Tatooine"
      },
      "films": [
        { "title": "A New Hope" },
        { "title": "The Empire Strikes Back" },
        { "title": "Return of the Jedi" },
        { "title": "Revenge of the Sith" }
      ]
     }
  }
}

The GraphQL query is the exact structure of the JSON data object, except without all the "value" parts (bold in table 1.2). If you think of this in terms of a question-answer relation, the question is the answer statement without the answer part:

If the answer statement is:
The name of the _Star Wars character who has the ID 4 is Darth Vader._

A good representation of the question is the same statement without the answer part:
(What is) the name of the _Star Wars character who has the ID 4?_

The same relationship applies to a GraphQL query. Take a JSON data object and remove all the "answer" parts (the values), and you end up with a GraphQL query suitable to represent a question about that JSON data object.

Now, compare the GraphQL query with the UI view that uses it. Every element of the GraphQL query is used in the UI view, and every dynamic part that is used in the UI view appears in the GraphQL query.

This obvious mapping is one of the greatest powers of GraphQL. The UI view knows the exact data it needs, and extracting that requirement from the view code is fairly easy. You simply look for what variables are used in the view.

If you think about this in terms of multiple nested UI components, every UI component can ask for the exact part of the data that it needs, and the application data needs can be constructed by putting together these partial data needs. GraphQL provides a way for a UI component to define the partial data need via a feature called fragments. You will learn about GraphQL fragments in chapter 3.

Furthermore, if you invert this mapping model, you find another powerful concept. If you have a GraphQL query, you know exactly how to use its response in the UI because the query will have the same structure as the response. You do not need to inspect the response to know how to use it, and you do not need any documentation about the API. It is all built in.

Star Wars data has a GraphQL API (see https://az.dev/swapi-graphql). You can use the GraphiQL editor available there to test a GraphQL query. We’ll talk about the GraphiQL editor in the next chapter, but you can go ahead and try to construct the example data person object. There are a few minor differences that you will learn about later in the book, but here is the official query you can use against this API to read the data requirement for the same view (with Darth Vader as an example).

Listing 1. 13. GraphQL query for the Star Wars example | az.dev/gia
{
  person(personID: 4) {
    name
    birthYear
    homeworld {
      name
    }
    filmConnection {
      films {
        title
      }
    }
  }
}
If you are reading the print version of this book, you can copy the text of all useable code listings in the book at az.dev/gia. The query in listing 1.13 can be found there along with any listings that have a link in their caption.

Just paste this query in the editor area and click the Run button. This request will give you a response structure very close to what the view used. You expressed the data need in a way that is close to how you would express it in English, and you get all the data in a single network round trip.

Is GraphQL a REST killer?

When I first learned about GraphQL, I tweeted that "REST APIs can REST IN PEACE!" Joking aside, I don’t really think that GraphQL is a REST API "killer." I do think, however, that more people will pick GraphQL over REST for APIs used by web and mobile applications. REST APIs have their place, and I don’t think that place is for web and mobile applications.

I believe GraphQL will do to REST what JSON did to XML. XML is still pretty heavily used, but almost every web-based API I know of today uses the JSON format.

GraphQL offers many advantages over REST APIs, but let’s also talk about the challenges GraphQL brings to the table.

3. GraphQL problems

Perfect solutions are fairy tales. The flexibility that GraphQL introduces opens a door to some clear issues and concerns.

3.1. Security

A critical threat for GraphQL APIs is resource-exhaustion attacks (aka denial of service attacks). A GraphQL server can be attacked with overly complex queries that consume all the server resources. It is very simple to query for deeply nested relationships (user → friends → friends → friends …​) or use field aliases to ask for the same field many times. Resource-exhaustion attacks are not specific to GraphQL, but when working with GraphQL, you have to be extra careful about them.

This resource-exhaustion problem can also come from non-malignant client applications that have certain bugs or bad implementations. Remember that a GraphQL client is free to ask for whatever data it requires, so it might just ask for too much data at once.

There are some mitigations you can use. You can implement cost analysis on the query in advance and enforce limits on the amount of data that can be consumed. You can also implement a timeout to kill requests that take too long to resolve. In addition, since a GraphQL service is just one layer in any application stack, you can handle the rate-limit enforcement at a lower level under GraphQL.

If the GraphQL API endpoint you are trying to protect is not public and is designed for internal use by your client applications (web or mobile), you can use an allow list approach and preapprove queries the server can execute. Clients can ask the server to execute preapproved queries using a unique query identifier. While this approach reintroduces some dependencies between servers and clients, automation strategies can be used to mitigate against that issue. For example, you can give the frontend engineers the freedom to modify the queries and mutations they use in development and then automatically replace them with their unique IDs during deployment to production servers. Some client-side GraphQL frameworks are already testing similar concepts.

Authentication and authorization are other concerns that you need to think about when working with GraphQL. Do you handle them before, after, or during a GraphQL resolve process?

To answer this question, think of GraphQL as a domain-specific language (DSL) on top of your backend data-fetching logic. It is just one layer that you could put between the clients and your actual data services. Think of authentication and authorization as another layer. GraphQL will not help with the actual implementation of the authentication or authorization logic. It is not meant for that. But if you want to put these layers behind GraphQL, you can use GraphQL to communicate the access tokens between the clients and the enforcing logic. This is very similar to the way authentication and authorization are usually implemented in REST APIs.

3.2. Caching and optimizing

One task that GraphQL makes a bit more challenging is clients' caching of data. Responses from REST APIs are a lot easier to cache because of their dictionary nature. A specific URL gives certain data, so you can use the URL itself as the cache key.

With GraphQL, you can adopt a similar basic approach and use the query text as a key to cache its response. But this approach is limited, is not very efficient, and can cause problems with data consistency. The results of multiple GraphQL queries can easily overlap, and this basic caching approach will not account for the overlap.

There is a brilliant solution to this problem. A graph query means a graph cache. If you normalize a GraphQL query response into a flat collection of records and give each record a global unique ID, you can cache those records instead of caching the full responses.

This is not a simple process, though. There will be records referencing other records, so you will be managing a cyclic graph. Populating and reading the cache will require query traversal. You will probably have to implement a separate layer to handle this cache logic. However, this method will be a lot more efficient than response-based caching.

One of the other most famous problems you may encounter when working with GraphQL is commonly referred to as N+1 SQL queries. GraphQL query fields are designed to be standalone functions, and resolving those fields with data from a database might result in a new database request per resolved field. For simple REST API endpoint logic, it is easy to analyze, detect, and solve N+1 issues by enhancing the constructed SQL queries. For GraphQL dynamically resolved fields, it is not that simple.

Luckily, Facebook is pioneering one possible solution to both the caching problem and the data-loading-optimization problem: it’s called DataLoader.

As the name implies, DataLoader is a utility you can use to read data from databases and make it available to GraphQL resolver functions. You can use DataLoader instead of reading the data directly from databases with SQL queries, and DataLoader will act as your agent to reduce the SQL queries you send to the database (figure 1.6).

ch01 fig 07 gqlia
Figure 1. 6. DataLoader can optimize the requests between GraphQL and databases.

DataLoader uses a combination of batching and caching to accomplish that. If the same client request results in a need to ask the database about multiple things, DataLoader can consolidate these questions and batch-load their answers from the database. DataLoader also caches the answers and makes them available for subsequent questions about the same resources.

There are other SQL optimization strategies that you can use. For example, you can construct optimal join-based SQL queries by analyzing GraphQL requests. If you are using a relational database with native efficient capabilities to join tables of data and reuse previously parsed queries, then in many cases, a join-based strategy may be more efficient than ID-based batching. However, ID-based batching is much easier to implement.

3.3. Learning curve

Working with GraphQL requires a bigger learning curve than the alternatives. A developer writing a GraphQL-based frontend application has to learn the syntax of the GraphQL language. A developer implementing a GraphQL backend service has to learn a lot more than just the language: they have to learn the API syntax of a GraphQL implementation. They must also understand schemas and resolvers, among many other concepts specific to a GraphQL runtime.

This is less of an issue in REST APIs because they do not have a client language or require any standard implementations. You have the freedom to implement REST endpoints however you wish because you don’t have to parse, validate, and execute special language text.

4. Summary

  • The best way to represent data in the real world is with a graph data structure. A data model is a graph of related objects. GraphQL embraces this fact.

  • A GraphQL system has two primary components: the query language, which can be used by consumers of data APIs to request their exact data needs; and the runtime layer on the backend, which publishes a public schema describing the capabilities and requirements of data models. The runtime layer accepts incoming requests on a single endpoint and resolves incoming data requests with predictable data responses. Incoming requests are strings written with the GraphQL query language.

  • GraphQL is all about optimizing data communication between a client and a server. GraphQL allows clients to ask for the exact data they need in a declarative way, and it enables servers to aggregate data from multiple data storage resources in a standard way.

  • GraphQL has an official specification document that defines standard rules and practices that all implementers of GraphQL runtimes must adhere to.

  • A GraphQL service can be written in any programming language, and it can be conceptually split into two major parts: a structure that is defined with a strongly typed schema representing the capabilities of the API, and behavior that is naturally implemented with functions known as resolvers. A GraphQL schema is a graph of fields, which have types. This graph represents all the possible data objects that can be read (or updated) through the GraphQL service. Each field in a GraphQL schema is backed by a resolver function.

  • The difference between GraphQL and its previous alternatives is that it provides standards and structures to implement API features in maintainable, scalable ways. The alternatives lack such standards. GraphQL also solves many technical challenges like having to do multiple network round trips and deal with multiple data responses on the client.

  • GraphQL has some challenges, especially in the areas of security and optimization. Because of the flexibility it provides, securing a GraphQL API requires thinking about more vulnerabilities. Caching a flexible GraphQL API is also a lot harder than caching fixed API endpoints (as in REST APIs). The GraphQL learning curve is also steeper than that of many of its alternatives.