Graphql In Action
Optimizing Data Fetching

Optimizing data fetching

This chapter covers

  • Caching and batching data-fetch operations

  • Using the DataLoader library with primary keys and custom IDs

  • Using GraphQL’s union type and field arguments

  • Reading data from MongoDB

Now that we have a GraphQL service with a multi-model schema, we can look at one of GraphQL’s most famous problems: the N+1 queries problem. We’ve ended the previous chapter with a GraphQL query that fetches data from 3 database tables:

Listing 7. 1. The N+1 query example
{
  needFeaturedList {
    // ·-·-·
    author {
      // ·-·-·
    }
    approachList {
      // ·-·-·
      author {
        // ·-·-·
      }
    }
  }
}

Because the GraphQL runtime traverses the tree field by field and resolves each field on its own as it does, this simple GraphQL query resulted in a lot more SQL statements than necessary.

1. Caching and batching

To analyze a solution to this problem, let’s go back to the simpler query in listing 6.25:

Listing 7. 2. The needFeaturedList query
{
  needFeaturedList {
    content
    author {
      id
      email
      name
      createdAt
    }
  }
}

If you remember, this query was issuing four SQL SELECT statements to the database for the seed data, which is an example of the N+1 problem (N being 3 Needs records). We’ve seen how to use database join views to make it execute only one SQL statement but that solution is not ideal. It’s not easy to maintain or scale.

Another solution is to leverage the concepts of caching and batching for all the needed SQL statements in a single GraphQL operation:

  • Caching: The least we can do is cache any SQL statements issued and then use the cache the next time we need the exact same SQL statement. If we ask the database about user #1, do not ask it again about user #1, just use the previous value. Doing this in a single request (from an API consumer) is a no-brainer but you can also leverage longer-term multi-request caching if you need to optimize things further. However, caching by itself is not enough. We also need to group queries asking for data from the same tables.

  • Batching: We can delay asking the database about a certain resource until we figure out the "IDs" of all the records that need to be resolved. Once these IDs are identified, we can use a single query that takes in a list of IDs and returns the list of records for them. This enables us to issue a SQL statement per table and that will reduce the SQL statements needed for the simple query in listing 7.2 to just 2: one for the azdev.needs table and one for the azdev.users table.

Manually managing these caching and batching operations would still be a lot of work. However, this is the type of work that can be abstracted to a separate library. After releasing the GraphQL.js reference implementation, the Facebook team also released a reference implementation for such a library. They named it DataLoader. You can check its codebase at github.com/graphql/dataloader.

While the DataLoader JavaScript project originated at Facebook as another reference implementation mirroring Facebook’s own internal data-loading library, it has since moved to the GraphQL foundation and it is now maintained by the GraphQL community. It’s now a stable and battle-tested project and despite being young (it was released in 2015), it’s already part of thousands of open-source projects and it gets downloaded millions of times each month.

Dataloader is a generic JavaScript utility library that can be injected into your application’s data-fetching layer to manage the caching and batching operations on your behalf.

To use DataLoader in the AZdev API project, we need to install it first:

$ npm i dataloader

This npm package has a default export which we usually import as DataLoader. This default export is a JavaScript class that we can use to instantiate a DataLoader instance. The DataLoader class constructor expects a function as its argument and that function is expected to do the data fetching. This function is known as the batch loading function because it expects an array of key IDs and it should fetch all records associated with these IDs in one batch action and return these records as an array that has the same order as the array of input IDs.

For example, to create a loader responsible for loading user records, here’s one possible way for it (don’t add this yet):

Listing 7. 3. The DataLoader syntax
import DataLoader from 'dataloader';

const userLoader = new DataLoader(
  ids => getUsersByIds(ids)           (1)
);
1 The ids argument is an array and getUsersByIds is the batch loading function that takes an array of IDs and returns an array of user records representing these IDs (in order).

Once you have the logic of fetching a list of user records based on a list of ID values, you can start using the userLoader object to fetch multiple users. For example, imagine that a single request in your API application needs to load information about users in the following order:

Listing 7. 4. Using a DataLoader object
const promiseA = userLoader.load(1);
const promiseB = userLoader.load(2);

// await on something async

const promiseC = userLoader.load(1);

DataLoader will take care of batching the first 2 statements into a single SQL statement because they happened in the same frame of execution (which is known in Node.js as a single tick of the event loop).

For the third statement, Dataloader will use its memoization cache of .load() calls. User #1 has already been fetched from the database (in the previous frame of execution, but still in the same request). Fetching it again would be redundant.

This minimization of SQL statements and elimination of redundant loads relieves pressure on your data storage services but it also results in creating fewer objects overall which may relieve memory pressure on your API application as well.

While you can do the batching and caching manually, DataLoader enables you to decouple the data-loading logic in your application without sacrificing the performance of the caching and batching optimizations. DataLoader instances present a consistent "API" over your various data sources (PostgreSQL, MongoDB, and any others).

This allows you to focus on your application’s logic and safely distribute its data-fetching requirements without worrying about maintaining minimal requests to your databases and other sources of data.

1.1. The batch loading function

A batch loading function like getUsersByIds in listing 7.3 accepts an array of IDs (or generic keys) and it should return a promise object which resolves to an array of records. To be compatible with DataLoader, the resulting array must be the exact same length as the input array of IDs and each index in the resulting array of records must correspond to the same index in the input array of IDs.

For example, if the getUsersByIds batch function was provided the input array of IDs [ 2, 5, 3, 1 ], that function will need to issue 1 SQL statement to fetch all user records for these IDs. Here’s one way to do that in PostgreSQL:

Listing 7. 5. Using the SQL IN operator
SELECT *
FROM azdev.users
WHERE id IN (2, 5, 3, 1);
If an ORDER BY clause is not specified in a SELECT statement, the database will pick the most efficient way to satisfy the statement and the order of the returned rows will not be guaranteed.

For the sake of this example, let’s assume that, for this SQL statement, the database returned 3 user records (instead of 4) and it did that in the following order:

{ id: 5, name: 'Max' }   (1)
{ id: 1, name: 'Jane' }
{ id: 2, name: 'Mary' }

(2)
1 The results order is different from the order of IDs in the input array.
2 The database did not have a user corresponding to the input id 3.

The getUsersByIds batch loading function CANNOT use the result of that SQL statement as is. It needs to re-order the records and ensure that each item aligns with the original order of IDs: [ 2, 5, 3, 1 ]. If an id has no corresponding record in the result, it should be represented with a null value:

[
  { id: 2, name: 'Mary' },
  { id: 5, name: 'Max' },
  null,
  { id: 1, name: 'Jane' }
]

In Chapter 7, we wrote a users.byId function in lib/db-clients/pg-api.js. Let’s convert that into a batch loading function and see what we need to do to make it DataLoader-compatible.

First, let’s rename it to users.byIds and make it take an array of ids instead of a single value and return an array of user records instead of a single user record. We’ll need to change the userInfo SQL statement as well.

There are many ways to fetch multiple records from the database using an array of IDs. The easiest way is to use the ANY PostgreSQL comparison construct because we can feed it an array directly (so no array manipulation would be needed).

The same query in listing 7.5 can be written with ANY this way:

Listing 7. 6. Using ANY with an array to fetch multiple records
SELECT *
FROM azdev.users
WHERE id = ANY ('{2, 5, 3, 1}');

Note how the array syntax in PostgreSQL uses curly brackets instead of square brackets. This might look a bit weird but we don’t have to deal with this issue ourselves as the driver we’re using will take care of that conversion for us.

With that knowledge, here are all the changes we need to make a DataLoader-compatible users.byIds batch loading function:

Listing 7. 7. Changes in lib/db-api/pg.js
const sqls = {
  usersInfo: 'SELECT * FROM azdev.users WHERE id = ANY ($1)',
  // ·-·-·
};

const pgApiWrapper = async () => {
  // ·-·-·
  return {
    // ·-·-·
    users: {
      byIds: async (ids) => { (1)
        const pgResp = await pgQuery(sqls.usersInfo, [ids]); (2)
        return ids.map((id) =>
          pgResp.rows.find((row) => id === row.id)
        );
      },
    },
  };
};
1 The ids argument here is expected to be an array of numbers
2 The array [ids] here is the array of values passed to the pg driver. We only use one value ($1)

Note how I renamed variables to make it clear that this function now works with arrays rather than single values. Note also how I used a .map call on the input array to ensure that the output array has the exact same length and order. DataLoader will not work properly if you don’t do that.

The .map with .find is NOT the most efficient way to accomplish the task in listing 7.7 but it is a simple one. You should consider converting the pgReps.rows array into an object and do a constant time lookup within the .map loop. I’ll leave that as an exercise for you. Look at the final codebase for the project (az.dev/contribute) to see all the optimization I ended up doing for this part.

This batch loading function is ready. Let’s use it.

1.2. Defining and using a DataLoader instance

DataLoader caching is not meant to be part of your application-level caching that’s shared among requests. It’s meant to be a simple memoization to avoid repeatedly loading the same data in the context of a single request in your application.

To do that, you should initialize a loader object for each request in your application and use it only for that request.

Since we’re using the Express.js framework to manage requests to the AZdev API, to make the "users loader" scoped to a single request, we can define it inside the "listener function" of the server.use call that we have in lib/server.js.

However, the code currently delegates the whole listener function argument to the graphqlHTTP higher order function. We’ll need to change that. Here’s one way of doing so:

Listing 7. 8. Changes in the server.use call in lib/server.js
  server.use(
    '/',
    (req, res) => {
      const loaders = {
        users: new DataLoader((ids) => pgApi.users.byIds(ids)),
      };
      graphqlHTTP({
        schema,
        context: { pgApi, loaders },
        graphiql: true,
      })(req, res);
    }
  );

This change introduces a new function as the listener function for server.use, defines the users loader within this new request-scoped context, and then delegates the rest of the work back to the graphqlHTTP function. With that, we’re now able to make the new loaders object part of the GraphQL context object.

Note how I made loaders into an object that has the users DataLoader instance as a property because we will eventually be introducing more loaders for more database models.

Also, note how the graphqlhTTP function is a higher-order one that returns another function. Its returned function expects the req and res arguments (coming from the Express listener function). We are basically doing the same thing we had before, but now that we introduced a wrapper listener function we needed to pass these req and res objects manually.

Note how I defined the loaders object for both POST and GET requests. Ideally, it should only be defined for POST requests. I’ll leave that part for you to optimize.

That’s it for the loaders initializing. Each time there is a request, we’re creating a DataLoader instance for the users model that will make use of the byIds batch loading function that we have prepared. Let’s now replace the manual direct fetching of users in the codebase so far with this brand new DataLoader instance. We need to modify the two GraphQL types that previously used users.byId():

1) The Need type:

Listing 7. 9. Changes in the Need type in lib/schema/types/need.js
const Need = new GraphQLObjectType({
  name: 'Need',
  fields: {
    // ·-·-·

    author: {
      type: new GraphQLNonNull(User),
      resolve: (source, args, { loaders }) =>
        loaders.users.load(source.user_id),
    },

    // ·-·-·
  },
});

2) The Approach type:

Listing 7. 10. Changes in the Approach type in lib/schema/types/approach.js
const Approach = new GraphQLObjectType({
  name: 'Approach',
  fields: {
    // ·-·-·

    author: {
      type: new GraphQLNonNull(User),
      resolve: (source, args, { loaders }) =>
        loaders.users.load(source.user_id),
    },

    // ·-·-·
  },
});

The changes to these types are identical. We use the new loaders object in the resolver’s context object (instead of the previous pgApi object) and use the .load call on the users DataLoader instance.

DataLoader will take care of the rest! When multiple .load calls are made in the same execution context while the GraphQL query is being resolved, DataLoader will batch the calls. It’ll prepare an array from all the loaded IDs and execute the batch loading function just once. It’ll then use the response for that single call to satisfy all the users data requirements that were made in the query.

If we try the same GraphQL query in listing 7.2 now while tailing the logs of PostgreSQL, we’ll see the following:

Here’s the related excerpt from my PostgreSQL logs:

LOG:  statement: SELECT * FROM azdev.needs WHERE is_featured = true;
LOG:  execute <unnamed>: SELECT * FROM azdev.users WHERE id = ANY ($1)
DETAIL:  parameters: $1 = '{1}'     (1)
1 "1" is the ID value for the user I used in the seeds data. You might have a different value.

Note how the parameter value is {1}, which represents an array in PostgreSQL. More importantly, note how only ONE SQL statement was issued for the users table (instead of 3 previously).

That’s a great value we added with just a few lines of code but to appreciate it more, let’s exclusively load the other ID-based data fetching through DataLoader’s instances. Remember the GraphQL query that made 11 SQL statements at the end of Chapter 7? Let’s see how many SQL statements it’ll make after converting all data-fetching logic to go through DataLoader instances.

1.3. The loader for the approachList field

The other ID-based fetching we are doing so far is the needs.approachList one in lib/db-api/pg.js:

Listing 7. 11. Changes in lib/db-api/pg.js
const pgApiWrapper = async () => {
  // ·-·-·

  return {
    needs: {
      // ·-·-·
      approachList: async (needId) => {
        const pgResp = await pgQuery(sqls.approachesForNeed, [
          needId,
        ]);
        return pgResp.rows;
      },
    },
    // ·-·-·
  };
};

This one is a bit different than the users.byIds one as it takes a needId and returns an array of Approach records. This means when we switch this to work with an array of keys instead of single value, it’ll take an array of needIds and it should return an array of arrays (each representing the list of approaches for one need).

DataLoader in this case will be concerned about the order of the top-level array. The order of the items in the inner-level arrays is an application-level concern.

The first thing we need to change is the approachForNeed SQL statement. We now need it to work with an array of need_id values. We can use the same ANY comparison construct:

Listing 7. 12. Changes in lib/db-api/pg.js
const sqls = {
  // ·-·-·
  approachesForNeeds:
    'SELECT * FROM azdev.approaches WHERE need_id = ANY ($1)',
};

Given a list of need_id values, this new SELECT statement will return ALL approaches for all these need_id values.

To keep things simple, I have not done any ordering in the approachesForNeeds statement. As an exercise, once you’re done with this Chapter, try to extend the GraphQL schema here and provide a way for the API consumer to order the list of approaches based on newest first, oldest first, and most popular first (based on the voteCount field).

To use this new SELECT statement, we need to make the following changes to the pgApiWrapper function:

Listing 7. 13. Changes in lib/db-api/pg.js
const pgApiWrapper = async () => {
  // ·-·-·

  return {
    needs: {
      // ·-·-·
      approachLists: async (needIds) => {
        const pgResp = await pgQuery(sqls.approachesForNeeds, [
          needIds,
        ]);
        return needIds.map((needId) =>
          pgResp.rows.filter((row) => needId === row.need_id)
        );
      },
    },
    // ·-·-·.
  };
};

To keep the top-level array order matching the input array order, I once again used a .map call on the input needIds array but this time I used .filter on the set of returned rows. The filter call will group the items in the response by the needId value. The returned result is an array of approach arrays.

The needs.approachLists batch loading function is now compatible with DataLoader. To use it, we instantiate a new loader instance in lib/server.js:

Listing 7. 14. Changes in lib/server.js
    const loaders = {
      users: new DataLoaderids) => pgApi.users.byIds(ids,
      approachLists: new DataLoader((needIds) =>
        pgApi.needs.approachLists(needIds)
      ),
    };

Then use this new instance in the GraphQL types that previously used a direct database fetch to list approaches. The only type that did that is the Need type:

Listing 7. 15. Changes in lib/schema/types/need.js
const Need = new GraphQLObjectType({
  name: 'Need',
  fields: {
    // ·-·-·

    approachList: {
      type: new GraphQLNonNull(
        new GraphQLList(new GraphQLNonNull(Approach))
      ),
      resolve: (source, args, { loaders }) =>
        loaders.approachLists.load(source.id),
    },
  },
});

That should do it. Go ahead and test the same query we tested at the end of Chapter 7 while tailing the PostgreSQL logs. This is what you should see now:

Here’s the related excerpt from my PostgreSQL logs:

LOG:  statement: SELECT * FROM azdev.needs WHERE is_featured = true;
LOG:  execute <unnamed>: SELECT * FROM azdev.users WHERE id = ANY ($1)
DETAIL:  parameters: $1 = '{1}'
LOG:  execute <unnamed>: SELECT * FROM azdev.approaches WHERE need_id = ANY ($1)
DETAIL:  parameters: $1 = '{1,2,3}'

The key thing to notice here is how we’re now using a single SQL query to fetch approaches for all 3 featured needs. This same query will be used if we have a 100 or a 1000 featured needs.

With the new DataLoader instances in place, the query that used 11 SQL statements before is now using only 3 statements (one statement per database table). We did not need to do any joins or fields renaming. This is a big win!

This win also contributes to making the server more resilient to denial of service attacks using intentionally-complicated queries. For example, if we used GraphQL’s alias concept to ask for approaches data multiple times like this:

Listing 7. 16. An intentionally-complicated query example
{
  needFeaturedList {
    id
    author {
      id
    }
    a1: approachList {
      id
      author {
        id
      }
    }
    a2: approachList {
      id
      author {
        id
      }
    }
    a3: approachList {
      id
      author {
        id
      }
    }
  }
}

The DataLoader instances would take care of not going to the database multiple times per alias. This intentionally-complicated query would still only execute 3 statements over the wire. We should certainly put more protection layers between public queries and the backend schema but it’s good to know that if something slipped through the cracks we have some level of optimization to fallback onto.

Note how we did not need to optimize the featuredNeeds SQL statement because it does not depend on IDs (so no batching is needed). However, we can still use DataLoader to leverage the caching of any query asking for featured needs. Let’s do that next.

2. Using DataLoader with custom IDs for Caching

Although a DataLoader batch-loading function is often associated with a list of input "IDs", you don’t need actual IDs coming from primary fields in the database. You can come up with your own ID-to-result association and use DataLoader with that custom map that you designed. This is usually helpful to leverage the caching aspect of DataLoader. For example, you can come up with certain SQL statements and give each statement a unique label and that label becomes one of the "IDs" that you can use with a DataLoader instance.

We have many examples in the GraphQL schema we designed that could benefit from that. Let’s go through two of them; the needFeaturedList field and the search field.

2.1. The neadFeaturedList field

Let’s test how many SQL statements the following GraphQL query will currently issue:

Listing 7. 17. An example of a query using multiple aliases
{
  a1: needFeaturedList {
    id
  }
  a2: needFeaturedList {
    id
  }
  a3: needFeaturedList {
    id
  }
  a4: needFeaturedList {
    id
  }
}

Can you guess?

Since we have not used DataLoader for needFeaturedList, this GraphQL query will issue the exact same SELECT statement 4 times:

Here’s the related excerpt from my PostgreSQL logs:

LOG: statement: SELECT * FROM azdev.needs WHERE is_featured = true;
LOG: statement: SELECT * FROM azdev.needs WHERE is_featured = true;
LOG: statement: SELECT * FROM azdev.needs WHERE is_featured = true;
LOG: statement: SELECT * FROM azdev.needs WHERE is_featured = true;

We can leverage DataLoader to cache the data response it gets the first time it loads this list and then not go back to the database for the same request (of featured needs). However, DataLoader is wired to fetch a value for a key. You can think of the "value" here to be the list of featured needs but there are no keys in this case. To make this database statement work with DataLoader, we need to come up with a custom "key" for this select statement. A key here is just any unique label for it. I’ll use the label "featured" for this one.

Let’s do this change with a top-down approach this time (so far we’ve been doing a bottom-up approach). The type that needs to be resolved with featured needs is the root Query type. Instead of issuing pgApi.needs.allFeatured(), let’s assume we have a needsByTypes loader that can fetch any list of Need records by a certain type, and let’s fetch the featured type in there:

Listing 7. 18. Changes in lib/schema/index.js
const QueryType = new GraphQLObjectType({
  name: 'Query',
  fields: {
    // ·-·-·
    needFeaturedList: {
      type: new GraphQLNonNull(
        new GraphQLList(new GraphQLNonNull(Need))
      ),
      resolve: async (source, args, { loaders }) => {
        return loaders.needsByTypes.load('featured');
      },
    },
  },
});

I often start planning for a change in the codebase just like this. I find it helpful to think about the new objects and functions I need and USE them before I write them. I think this helps me come up with better and more practical changes.

Let’s now write the needsByTypes loader. We’ll need to add it to the listener function (in lib/server.js):

Listing 7. 19. Changes in lib/server.js
    const loaders = {
      // ·-·-·
      needsByTypes: new DataLoader((types) =>
        pgApi.needs.byTypes(types)
      ),
    };
    graphqlHTTP({
      schema,
      context: { loaders },
      graphiql: true,
    })(req, res);
  });

Again, I’ve used a byTypes property that does not exist on pgApi.needs yet. This is my plan for the new batch loading function. We’ll write that next.

Note how I removed the pgApi object from the context. We don’t need to query the database directly anymore. All DB communication should happen through a loader.

The new byTypes batch loading function is the special one in this case. It currently only supports loading the "featured" type but we still have to write it in a way that makes it accept an array of types and return an array of results associated with these types:

Listing 7. 20. Changes in lib/db-api/pg.js
const pgApiWrapper = async () => {
  const { pgPool } = await pgClient();

  const pgQuery = (text, params) => pgPool.query(text, params);

  return {
    needs: {
      byTypes: async (types) => {
        const results = types.map(async (type) => {
          if (type === 'featured') {
            const pgResp = await pgQuery(sqls.featuredNeeds);
            return pgResp.rows;
          }
          throw Error('Unsupported type');
        });
        return Promise.all(results);
      },
      // ·-·-·
    },
    // ·-·-·
  };
};

Note how the SQL query is within the .map call making the map function return a pending promise. That’s why I needed to wrap the returned result with a Promise.all call. Depending on what other types we ended up adding (or not adding), this particular way of fetching within a map call can be optimized. It’s okay as is for now because we’re really just faking the batching nature of this list.

That’s it. No matter how many aliases you add to fetch the needFeaturedList root field, DataLoader will only ask the database about that list once.

With DataLoader in the stack, we can now continue implementing other fields in queries and either use existing loaders or define new ones as needed.

2.2. The search field

Another SQL statement that we can put a label on is the one we need to support the root search field:

Listing 7. 21. The SDL text for NeedOrApproach and the search field
union NeedOrApproach = Need | Approach

type Query {
  # ·-·-·
  search(term: String!): [NeedOrApproach!]!
}

This feature has 2 new concepts that we’re going to implement for the first time: the GraphQL union type, which we named NeedOrApproach, and the field’s query argument.

Let’s start with the union type. As the name implies, the NeedOrApproach type means that an object of that type will be either a Need object or an Approach object. To define this special type, we use the GraphQLUnionType constructor and give it a list of all the possible types. In our case, that list has only 2 items but a union type can represent any number of types.

Listing 7. 22. The GraphQLUnionType syntax
import { GraphQLUnionType } from 'graphql';

// import all possible types

const T1OrT2OrT3 = new GraphQLUnionType({
  name: 'TypeName',
  types: [T1, T2, T3],
  resolveType(value) {
    // What type is actually used?
    // Is it T1, T2, or T3?
  }
});

The root search field will be defined as a list of the new NeedOrApproach type. However, for each row in the search result, the special union type needs a way to determine what type is actually used. It needs to tell the API consumer that a search result is a Need object or an Approach object. We do that using the resolveType configuration property, which is a function whose first argument is the object represented by this type. In our case, the resolveType method will be called for each search result. Let’s design the search result data to have a type property that holds a 'need' or 'approach' string value.

Here’s one implementation of the NeedOrApproach type that’s based on this plan. Put this in lib/schema/types/need-or-approach.js

Listing 7. 23. New file: lib/schema/types/need-or-approach.js
import { GraphQLUnionType } from 'graphql';

import Need from './need.js';
import Approach from './approach.js';

const NeedOrApproach = new GraphQLUnionType({
  name: 'NeedOrApproach',
  types: [Need, Approach],
  resolveType(obj) {
    if (obj.type === 'need') {
      return Need;
    }
    if (obj.type === 'approach') {
      return Approach;
    }
  },
});

export default NeedOrApproach;

We can now use this new type to define the root search field. We also need to define the arguments this field expects. We designed it to expect a query value which is a string. To resolve the search field, let’s assume that we have a loader named search that expects a query. Here’s one possible implementation of the field:

Listing 7. 24. Changes in lib/schema/index.js
import NeedOrApproach from './types/need-or-approach.js';

const QueryType = new GraphQLObjectType({
  name: 'Query',
  fields: {
    // ·-·-·

    search: {
      type: new GraphQLNonNull(
        new GraphQLList(new GraphQLNonNull(NeedOrApproach))
      ),
      args: {
        term: { type: new GraphQLNonNull(GraphQLString) },  (1)
      },
      resolve: async (source, { term }, { loaders }) => {   (2)
        return loaders.search.load({ term });               (3)
      },
    },
  },
});
1 This is where we define the name/type of a field argument
2 When a consumer passes values for arguments, these values get captured as an object passed as the second argument for each resolve method (and it’s commonly named args)
3 We read the value a consumer used for the query field argument here out of the resolve method’s args object

Following this top-down analysis, we now need to define a loader instance named search. Let me pause here and ask this question: do we really need a caching/batching loader for a search query?

Probably not. It’s unlikely an API consumer will do multiple searches in one query and unless they are intentionally trying to overload the API service, it’s unlikely they’ll search for the exact same term many times!

However, as we’ve seen in the needFeaturedList, using a loader provides some form of protection against bad queries (whether they were malicious or not). Imagine a UI application with a bug that caused a search query to repeat 100 times in a GraphQL request. You don’t want your API server to issue 100 full-text search SQL statements to the database in that case.

Another reason to use a loader for this case is simply consistency. DataLoader is now an abstraction layer in our stack and all database communication should happen through it. Mixing some direct database communication here would be a code smell.

Alright, let’s implement the loader. Its batch loading function will take a list of search query terms and it needs to do a full-text search operation for each.

Let’s assume that the pgApi module has a search method to do the SQL communication. However, remember that a search operation needs to be aware of WHO is performing the search. We’ve designed the Needs module to have ownership and private entries and when an API consumer sends a search query we need to search through all public Need records and all the private ones that belong to the current consumer. However, instead of making the search GraphQL field aware of who is using it, let’s plan on supporting the same search field under the viewer field which is dedicated to authenticated users.

Here’s what I came up with for the loader definition based on this plan:

Listing 7. 25. Changes in lib/server.js
const startGraphQLWebServer = async () => {
  // ·-·-·

  server.use('/', (req, res) => {
    const loaders = {
      // ·-·-·
      search: new DataLoader((term) =>
        pgApi.search(term)
      ),
    };
    // ·-·-·
  });

  // ·-·-·
};

The final piece of this puzzle is the pgApi.search method and that’s where all the full-text search logic is going to live. Luckily, PostgreSQL has built-in support for full-text search but before I show you how to use it we need a way to search 2 tables here. Can we do that with just a single SQL statement?

We can use PostgreSQL UNION operator which combines result sets of multiple queries into a single result. Its syntax is simple:

query1 UNION query2

However, the queries must be "union-compatible". This means they must have the same number of columns and the columns must have compatible data types.

The columns we have on the Needs and Approaches table are slightly different. To support all the fields we defined so far in the GraphQL schema, we need to include the tags and vote_count columns which both exist in one table but not the other.

To keep the single SQL statement constraint for this feature, we’ll need to come up with 2 SQL queries for each table and "fake" the missing columns to make them union-compatible. We can simply include "null as field_name" for the missing columns. Here’s the UNION statement I came up with:

Listing 7. 26. A workaround to make 2 different queries union-compatible
SELECT id, content, tags, null as vote_count,
       user_id, created_at, 'need' as type
FROM azdev.needs
UNION
SELECT id, content, null as tags, vote_count,
       user_id, created_at, 'approach' as type
FROM azdev.approaches

Note how the 2 queries have the exact set of columns although they are for 2 different tables. This is a workaround for the union-compatibility issue but I think it’s a worthy compromise to keep the single SQL statement constraint.

However, we need to improve this statement to account for public and private Need records. It should include only public Need records (and their approaches) and not private Need records (or their approaches). Here’s one way to do this filtering:

Listing 7. 27. Filtering Needs and Approaches
SELECT id, content, tags, null as vote_count,
       user_id, created_at, 'need' as type
FROM azdev.needs
WHERE is_public = true
UNION
SELECT a.id, a.content, null as tags, vote_count,
       a.user_id, a.created_at, 'approach' as type
FROM azdev.approaches a
JOIN azdev.needs n ON (a.need_id = n.id)
WHERE n.is_public = true

That’s our searchable view. We can now use PostgreSQL full-text search operators to find matches within this view and rank them based on relevance. Here’s the final query I came up with for this feature and how I wired it in the batch-loading function:

Listing 7. 28. Changes in lib/db-api/pg.sql
const sqls = {
  // ·-·-·
  search: `
    SELECT id, content, tags, user_id, created_at, type,
    ts_rank(
      to_tsvector(content),
      websearch_to_tsquery($1)
    ) as rank
    FROM (
      SELECT id, content, tags, null as vote_count,
             user_id, created_at, 'need' as type
      FROM azdev.needs
      WHERE is_public = true
      UNION
      SELECT a.id, a.content, null as tags, vote_count,
             a.user_id, a.created_at, 'approach' as type
      FROM azdev.approaches a
      JOIN azdev.needs n ON (a.need_id = n.id)
      where n.is_public = true
    ) search_view
    WHERE to_tsvector(content) @@ websearch_to_tsquery($1)
    ORDER BY rank DESC
  `,
};

const pgApiWrapper = async () => {
  // ·-·-·

  return {
    // ·-·-·
    search: async (searchTerms) => {
      const results = searchTerms.map(async (searchTerm) => {
        const pgResp = await pgQuery(sqls.search, [
          searchTerm,
        ]);
        return pgResp.rows;
      });
      return Promise.all(results);
    },
  };
};

I used the ts_rank, to_tsvector, websearch_to_tsquery, and @@ to perform the full-text search here. These are just one set of many functions and operators that can be used to perform and optimize this search but they are beyond the scope of this book. You can read more about PostgreSQL full-text search capabilities at jscomplete.com/pg-fts.

We can test now! Here’s an example of how to query the new search field in GraphQL:

Listing 7. 29. An example query to test the search field
{
  search(query: "git or sum") {
    __typename
    ... on Need {
      id
      content
    }
    ... on Approach {
      id
      content
    }
  }
}

This search should return 3 records from the seed data, two Need records, and one Approach record.

ch07 fig 02 gqlia

Note the power of the PostgreSQL text search capabilities we used. The example uses or and the search results are ranked based on relevance. Try a few other search terms and explore the many other features we enabled.

Let’s now implement the field that we’ve put on hold in the last chapter, the detailList field under an Approach record. I’ve delayed implementing this one because of its complexity and because we should really do it through DataLoader. I think we’re ready for it.

3. Using DataLoader with MongoDB

Since we’ve decided to store the dynamic details on an Approach record in MongoDB, we need to make a DataLoader instance that fetches data from there. This is the first time we’ll be reading data from MongoDB so we need to create an api object for MongoDB, instantiate it the main server, and use it define a DataLoader instance.

Let’s do this one with a top-down approach as well. Similar to how we named objects for PostgreSQL and where we stored its modules, let’s come up with a mongoApi module, place it in lib/db-api/mongo.js, and assume that it has a batch loading function to load a list of details objects given a list of approach IDs.

Here are the changes I came up with for lib/server.js (mirroring what’s there for PostgreSQL):

Listing 7. 30. Changes in lib/server.js
// ·-·-·
import mongoApiWrapper from './db-api/mongo.js';

const startGraphQLWebServer = async () => {
  const pgApi = await pgApiWrapper();
  const mongoApi = await mongoApiWrapper();
  const server = express();

  server.use('/', (req, res) => {
    const loaders = {
      // ·-·-·
      detailLists: new DataLoader((approachIds) =>
        mongoApi.approaches.detailLists(approachIds)
      ),
    };
    // ·-·-·.
  });

  // ·-·-·
};

The new mongoApi module will host all the interactions with MongoDB. Note how I plan to group interactions using the main collection for each one (approaches.detailLists instead of approachesDetailLists). Although we only have one collection in MongoDB, there is a great value in keeping things consistent.

Here’s the implementation I came up with for the new mongoApi module (also mirroring what we have so far for lib/db-api/pg.js). Put this in lib/db-api/mongo.js:

Listing 7. 31. New file: lib/db-api/mongo.js
import mongoClient from '../db-clients/mongo.js';

const mongoApiWrapper = async () => {
  const { mdb } = await mongoClient();

  const mdbFindDocumentsByField = ({
    collectionName,
    fieldName,
    fieldValues,
  }) =>
    mdb
      .collection(collectionName)
      .find({ [fieldName]: { $in: fieldValues } })
      .toArray();

  return {
    approaches: {
      detailLists: async (approachIds) => {
        // TODO: Use mdbFindDocumentsByField to
        // implement the batch loading logic here
      },
    },
  };
};

export default mongoApiWrapper;

Note how I placed the current mongo-client driver’s logic to find a list of documents in a MongoDB collection into its own function named mdbFindDocumentsByField (just like I did for pgQuery). Eventually, there will be more functions of these natures and they can be abstracted, maintained, and tested separately from the application-level logic that’s going to use them. In fact, I’m already thinking these functions should be moved to the "client" files but I’ll keep them next to their usages for simplicity.

The abstraction I did for mdbFindDocumentsByField is a bit more detailed than what i did for pgQuery because the mongo driver’s API uses JavaScript objects (vs SQL text statements in the pg driver). The level of abstraction is a matter of preference but what I tried to achieve here is keep everything related to the mongo driver separate from the application-level logic (to make the code DataLoader-compatible, for example). An argument can be made here that this is a premature abstraction but I have been down this road many times and I’ve found these abstractions greatly helpful as the project gets bigger.

Because the mongoApi.approaches.detailLits function is used as the batch loading function for a DataLoader instance, it needs to maintain the size and order of the input array of approachIds. Let’s figure out what we need here one piece at a time.

First, remember that the approachIds value are coming from PostgreSQL, which means we’ll need to filter the response we get out of MongoDB for the approachDetails collection using the pgId field on a document to find each document associated with each approach. If there is no match in MongoDB, that means the Approach record has no detail records.

Using the same .map() trick we did in pgApi, here’s the skeleton of what we need in mongoApi.approaches.detailLists:

Listing 7. 32. Changes in lib/db-api/mongo.js
const mongoDocuments = await mdbFindDocumentsByField({
  collectionName: 'approachDetails',
  fieldName: 'pgId',
  fieldValues: approachIds,
});

return approachIds.map((approachId) => {
  const approachDoc = mongoDocuments.find(
    (doc) => approachId === doc.pgId
  );

  if (!approachDoc) {
    return [];
  }

  const { explanations, notes, warnings } = approachDoc;   (1)

  // ·-·-·                                                   (2)
});
1 The destructured variables here will each hold an array of values. They can also be undefined
2 We need to restructure the raw MongoDB data here to match our GraphQL schema design

Once the ID-to-document map is done, each approachDetails document in MongoDB is an object whose properties represent the 3 content categories that we designed for the ApproachDetail ENUM type:

Listing 7. 33. The ApproachDetail ENUM type
enum ApproachDetailCategory {
  NOTE
  EXPLANATION
  WARNING
}

Each of these properties holds an array of text values. However, remember that we designed the ApproachDetail type to have a category field and a content field:

Listing 7. 34. The ApproachDetail type in the SDL text
type ApproachDetail {
  category: ApproachDetailCategory!
  content: String!
}

This means we need to do a bit of logic to take an object in the form of:

{
  explanations: [explanationsValue1, ·-·-·],
  notes: [notesValue1, ·-·-·],
  warnings: [warningsValue1, ·-·-·],
}

And convert it to:

[
  {
    content: explanationsValue1,
    category: "EXPLANATION"
  },
  {
    content: notesValue1,
    category: "NOTE"
  },
  {
    content: warningsValue1,
    category: "WARNING"
  },
  ·-·-·
]

Furthermore, the content categories are optional in an approachDetail document. One category might have 10 values while another category might not exist at all (and be undefined in Listing 7.32).

Considering all these points, here’s one way to do the needed conversion for the schema:

Listing 7. 35. The ApproachDetail ENUM type
return approachIds.map((approachId) => {
  const approachDoc = mongoDocuments.find(
    (doc) => approachId === doc.pgId
  );
  if (!approachDoc) {
    return [];
  }
  const { explanations, notes, warnings } = approachDoc;
  const approachDetails = [];
  if (explanations) {
    approachDetails.push(
      ...explanations.map((explanationText) => ({
        content: explanationText,
        category: 'EXPLANATION',
      }))
    );
  }
  if (notes) {
    approachDetails.push(
      ...notes.map((noteText) => ({
        content: noteText,
        category: 'NOTE',
      }))
    );
  }
  if (warnings) {
    approachDetails.push(
      ...warnings.map((warningText) => ({
        content: warningText,
        category: 'WARNING',
      }))
    );
  }
  return approachDetails;
});

We start with an empty array of objects (approachDetails). Then for each array-values property on an approachDetail document, if that array-value exists, we push the approachDetails array all the items in that array-value after mapping them to the { content: '·-·-·', category: '·-·-·' } structure.

There is certainly a chance here to do the 3 if statements in listing 7.35 dynamically with one loop and allow for future values in an Approach Detail Category. I’ll leave that for you as an exercise.

We can now use the new detailLists DataLoader instance to resolve the detailList field on the Approach field:

Listing 7. 36. The detailList field in the Approach type
type Approach implement Node {
  id: ID!
  createdAt: String!
  content: String!
  voteCount: Int!
  author: User!
  detailList: [ApproachDetail!]!
}

We’ll need to define 2 new types in this GraphQL schema. An ApproachDetail type and an ApproachDetailCategory type. Let’s start with the latter. It’s an ENUM with 3 fixed values. To define an ENUM type, the GraphQL.js API provides a GraphQLEnumType constructor that takes a configuration object representing the ENUM values. Here’s what I did to define the ApproachDetailCategory type. Put this in a new file at lib/schema/types/approach-detail-category.js:

Listing 7. 37. New file: lib/schema/types/approach-detail-category.js
import {
  GraphQLEnumType
} from 'graphql';

const ApproachDetailCategroy = new GraphQLEnumType({
  name: 'ApproachDetailCategory',
  values: {
    NOTE: {},           (1)
    EXPLANATION: {},    (1)
    WARNING: {},        (1)
  },
});

export default ApproachDetailCategroy;
1 These objects can be used to specify a description per value or deprecate a value. Also, if the values in the database are stored differently, like with numbers, you can do the string-to-number map in each value’s configuration object.

Now we can define the ApproachDetail type, which uses the ApproachDetailCategory type. The ApproachDetail type is a simple instance of GraphQLObjectType. Create a new file for it at lib/schema/types/approach-detail.js:

Listing 7. 38. New file: lib/schema/types/approach-detail.js
import {
  GraphQLObjectType,
  GraphQLString,
  GraphQLNonNull,
} from 'graphql';

import ApproachDetailCategory from './approach-detail-category.js';

const ApproachDetail = new GraphQLObjectType({
  name: 'ApproachDetail',
  fields: {
    content: {
      type: new GraphQLNonNull(GraphQLString),
    },
    category: {
      type: new GraphQLNonNull(ApproachDetailCategory),
    },
  },
});

export default ApproachDetail;

Finally, we’ll need to define the detailList field itself on the Approach type. This is where we use the new DataLoader instance and it’s exactly the same as we did for the previous ones. Hopefully this is getting easier now!

Listing 7. 39. Changes in lib/schema/types/approach.js
import {
  // ·-·-·
  GraphQLList,
} from 'graphql';

import ApproachDetail from './approach-detail.js';

const Approach = new GraphQLObjectType({
  name: 'Approach',
  fields: {
    // ·-·-·

    detailList: {
      type: new GraphQLNonNull(
        new GraphQLList(new GraphQLNonNull(ApproachDetail))
      ),
      resolve: (source, args, { loaders }) =>
        loaders.detailLists.load(source.id),
    },
  },
});

You can now test this new feature with the following query:

Listing 7. 40. An example query to test the detailList field
{
  needFeaturedList {
    content
    approachList {
      content
      detailList {
        content
        category
      }
    }
  }
}
ch07 fig 02 gqlia

For each of the 3 featured needs in our seed data, this query will fetch all approaches from PostgreSQL and then fetch all details on every approach from MongoDB. Guess how many times it reaches out to MongoDB?

ONE time.

Thanks to the DataLoader instance, all of the MongoDB operations will be batched into a single one. If you manage to tail the logs of your MongoDB server, you can see that it logs something like:

Here’s the related excerpt from my MongoDB logs:

command azdev.approachDetails command: find { find: "approachDetails", filter: { pgId: { $in: [ 1, 2, 3, 4 ] } }, returnKey: false, showRecordId: false, lsid: { id: UUID("·-·-·") }, $db: "azdev" } ·-·-·

4. Single resource fields

In our schema plan, the needInfo query root field is supposed to fetch the information for a single need identified by an ID that the API consumer can send as a field argument:

Listing 7. 41. The needInfo root field and its id argument
type Query {
  **needInfo(id: ID!): Need**
  [gray]##// ·-·-·##
}

Here’s a query that we can use to work through this field:

Listing 7. 42. Test query for the needInfo field
query needInfoTest {
  needInfo(id: 1) {
    id
    content
    author {
      id
    }
  }
}

Note how this field will have to support the nested author information, but guess what?! We don’t need to do anything new for that. We have already implemented this relation under needFeaturedList. The needInfo field is using the same output type (Need).

One might be tempted to think that since this needInfo field works with a single record there will be no need to use DataLoader for it. However, using DataLoader is preferable for many reasons. For one, having all database fetch requests go through DataLoader is simply a good code abstraction practice but there is still performance value for composite queries. For example, take a look at this query:

Listing 7. 43. Test query for the needInfo field
query manyNeedInfoTest {
  need1: needInfo(id: 1) {
    id
    content
    author {
      id
    }
  }
  need2: needInfo(id: 2) {
    id
    content
    author {
      id
    }
  }
}

Given that these two Need records are created by different authors, how many SQL statements are required to satisfy this query?

If everything was done through DataLoader, only two SQL statements will be required.

Before we implement the needInfo field, let’s do a little refactoring. The lib/schema/index.js file is currently doing 2 things that are logically separate. It defines the main Query type and uses it to create a schema. Let’s split these two tasks into two files instead of one.

Listing 7. 44. The new lib/schema/index.js:
import {
 GraphQLSchema,
 printSchema,
} from 'graphql';

import QueryType from './queries.js';

export const schema = new GraphQLSchema({
 query: QueryType,
});

console.log(printSchema(schema));

Move everything else that was in that file to lib/schema/queries.js and export the main QueryType from that file:

Listing 7. 45. New file lib/schema/queries.js
// ·-·-·

const QueryType = new GraphQLObjectType({
  // ·-·-·
});

export default QueryType;

To support the needInfo field, we need a DataLoader instance that loads Need records by their IDs. This is similar to the users DataLoader instance:

Listing 7. 46. Changes to the loaders object in lib/server.js
const loaders = {
  users: new DataLoaderids) => pgApi.users.byIds(ids,
  needs: new DataLoaderids) => pgApi.needs.byIds(ids,

  // ·-·-·
};

Now we can implement the root field in lib/schema/queries.js (the new file we extracted). There is nothing new about this field so try to implement it yourself before checking out my implementation.

Listing 7. 47. Changes in lib/schema/queries.js
import {
  GraphQLID,
  GraphQLObjectType,
  GraphQLString,
  GraphQLInt,
  GraphQLNonNull,
  GraphQLList,
} from 'graphql';

// ·-·-·

const QueryType = new GraphQLObjectType({
  name: 'Query',
  fields: {
    // ·-·-·
    needInfo: {
      type: Need,
      args: {
        id: { type: new GraphQLNonNull(GraphQLID) },
      },
      resolve: async (source, args, { loaders }) => {
        return loaders.needs.load(args.id);
      },
    },
  },
});

The implementation of the pgApi.needs.byIds batch-loading method is also very similar to pgApi.users.byIds:

Listing 7. 48. Changes in lib/db-api/pg.js
import pgClient from '../db-clients/pg.js';

const sqls = {
  needsInfo: `
    SELECT * FROM azdev.needs WHERE id = ANY($1)
  `,
  // ·-·-·
};

const pgApiWrapper = async () => {
  // ·-·-·

  return {
    needs: {
      byIds: async (ids) => {
        const pgResp = await pgQuery(sqls.needsInfo, [ids]);

        return ids.map((id) =>
          pgResp.rows.find((row) => id == row.id)             (1)
        );
      },
      // ·-·-·
    },

    // ·-·-·
  };
};
1 Note the loose equality operator (==) here.

That’s it! Go ahead and test the manyNeedInfoTest query in Listing 4.43 (you might need to change the IDs values for your test data).

Challenge

Can you think of the reason why I used the abstract (or loose) equality operator in Listing 7.48?

Answer: The id argument in needInfo is defined with the GraphQLId type. GraphQL casts any value you pass to a GraphQLId field to a string. The actual id values coming from PostgreSQL are integers (because all primary key fields were defined using the serial type). Comparing integers to strings is one of the rare cases where the loose equality operator is useful.

Alternatively, you can do the number to string casting yourself before doing the comparison.

5. The viewer field

We’ve previously implemented the User type but without the needList field. Now that we are implementing the viewer field, it’s time to also support the needList field on the User type but restrict it to only be available within the viewer field scope.

Let’s start with the viewer field itself first. It takes an authToken argument and returns a User record. Let’s plan on having a usersByAuthTokens DataLoader instance to resolve its data:

Listing 7. 49. Changes in lib/schema/queries.js
import User from './types/user.js';

const QueryType = new GraphQLObjectType({
  name: 'Query',
  fields: {
    // ·-·-·
    viewer: {
      type: User,
      args: {
        authToken: { type: new GraphQLNonNull(GraphQLString) },
      },
      resolve: async (source, args, { loaders }) => {
        return loaders.usersByAuthTokens.load(args.authToken);
      },
    },
  },
});

Now in lib/server.js, we need to add the usersByAuthTokens DataLoader instance. Let’s plan on having a pgApi.users.byAuthTokens batch-loading method:

Listing 7. 50. Changes to the loaders object in lib/server.js
const loaders = {
  usersByAuthTokens: new DataLoader((tokens) =>
    pgApi.users.byAuthTokens(tokens)
  ),
  // ·-·-·
};

The byAuthTokens function will have to do a match in the database against the hash_auth_token column in the azdev.users table. The value stored in the database is hashed using PostgreSQL’s crypt function (which is part of the pgcrypto extension). To match the plain text token to the hashed one we need to also use the crypt function, but this time passing the hashed value itself as the "salt" of the operation. The details of this PostgreSQL-specific operation is beyond the scope of this book but you can read the documentations for it at jscomplete.com/pgcrypto.

We also need to do the "promise-map trick" for this batch-loading method to make it DataLoader-compliant. Here are the changes that are needed in lib/db-api/pg.js to implement the byAuthTokens method:

Listing 7. 51. Changes in lib/db-api/pg.js
const sqls = {
  usersInfoByAuthToken: `
    SELECT *
    FROM azdev.users
    WHERE hashed_auth_token = crypt($1, hashed_auth_token)
  `,
  // ·-·-·
};

const pgApiWrapper = async () => {
  // ·-·-·

  return {
    // ·-·-·

    users: {
      // ·-·-·
      byAuthTokens: async (tokens) => {
        const results = tokens.map(async (token) => {
          const pgResp = await pgQuery(
            sqls.usersInfoByAuthToken,
            [token]
          );
          return pgResp.rows[0];
        });
        return Promise.all(results);
      },
    },

    // ·-·-·
  };
};

This takes care of loading an authorized user record under the viewer field. To test, the seed-data script uses the authToken value of "testToken123" for the user it imports. You can use this query to test what we’ve done so far:

{
  viewer(authToken: "testToken123") {
    id
    email
  }
}

However, this whole viewer scope feature was added so that we can give authorized users a way to see all their Need records. We need to implement the needList field under the User type next.

5.1. The needList field

To do that, we need to distinguish between a needList field used on any user record (in any scope) and a needList field used within the viewer field scope.

There are multiple ways to do that. One of them is to utilize the 4th info argument to figure out the "path" for a needList field. This might work okay but I am going to use a different method.

Since we have different SQL SELECT statements for a user record fetched by an id value and a user record fetched with an authToken value, we can use their returned objects in the resolvers (which becomes the source object under the User type) to determine if a needList field should be supported or not. In fact, let’s make the same authToken value (coming from the viewer field arguments) available to the needList field and make the query for needList re-use that value to load only the Need records associated with it (by re-using the crypt compare statement in listing 7.51).

To do that, modify the usersInfoByAuthToken SQL statement to include the passed in token:

Listing 7. 52. Including an input value in a SELECT statement
const sqls = {
  usersInfoByAuthToken: `
    SELECT *, $1 as auth_token
    FROM azdev.users
    WHERE hashed_auth_token = crypt($1, hashed_auth_token)
  `,
  // ·-·-·
};

Now the auth_token field can be accessed within a needList resolver through the source argument (1st one in each resolver). Here are the changes we need to do for the User type:

Listing 7. 53. Changes in lib/schema/types/user.js (in bold)
import {
  GraphQLID,
  GraphQLObjectType,
  GraphQLString,
  GraphQLNonNull,
  GraphQLList,
} from 'graphql';

import Need from './need.js';

const User = new GraphQLObjectType({
  name: 'User',
  fields: () => {
    return {
      // ·-·-·
      needList: {
        type: new GraphQLNonNull(
          new GraphQLList(new GraphQLNonNull(Need))
        ),

        resolve: (source, args, { loaders }) => {
          if (!source.auth_token) {                    (1)
            return [];
          }
          return loaders.needsForUser.load(source.auth_token);
        },
      },
    };
  },
});
1 The auth_token property is now available because of the change we made to the usersInfoByAuthToken SQL statement in listing 7.52.

Note one important thing about this change. I changed the fields configuration property to be a function instead of an object. This is supported out-of-the-box in GraphQL.js and it’s handy for situations like these. Because we’re adding a new dependency to the User type so that it can load Need records and because the Need type already depends on the User type (because of the author relation), this change represents a "circular dependency" between these Node.js modules. Changing the fields configuration property to be a function delays this dependency and uses it dynamically instead of statically. That function is basically executed after Node.js loads all modules. It’s a good practice to ALWAYS use the function signature for the fields configuration property instead of the object form.

So with the code in listing 7.53, if the source object was loaded using the usersInfoByAuthToken SQL statement, it’ll make use of a new DataLoader instance to load a list of Need records under a user record (identified by an authToken value).

Let’s define the newly-used loaders.needsForUser function:

Listing 7. 54. Changes to the loaders object in lib/server.js
const loaders = {
  // ·-·-·
  needsForUser: new DataLoader((authToken) =>
    pgApi.needs.forUsers(authToken)
  ),
  // ·-·-·
};

Finally, the implementation of pgApi.needs.forUsers will make use of the same hash-compare trick to identify the user record associated with an authToken value. Then, it uses a database JOIN operation to return all the Need records for that user:

Listing 7. 55. Changes in lib/db-api/pg.js
const sqls = {
  userNeedList: `
    SELECT n.*
    FROM azdev.needs n, azdev.users u
    WHERE u.hashed_auth_token = crypt($1, u.hashed_auth_token)
    AND u.id = n.user_id
  `,
  usersInfoByAuthToken: `
    SELECT *, $1 as auth_token
    FROM azdev.users
    WHERE hashed_auth_token = crypt($1, hashed_auth_token)
  `,
};

const pgApiWrapper = async () => {
  // ·-·-·

  return {
    needs: {
      // ·-·-·
      forUsers: async (tokens) => {
        const results = tokens.map(async (token) => {
          const pgResp = await pgQuery(sqls.userNeedList, [
            token,
          ]);
          return pgResp.rows;
        });
        return Promise.all(results);
      },
    },

    // ·-·-·
    },
  };
};

We could skip the second hash-compare operation because we did it already in the parent viewer scope but I did that for 2 reasons: To show you how to access a parent field argument value from a child field and because this ensures that no matter what’s around it the needList field cannot be used without an authToken value in scope (because it depends on it).

You can test now that the needList field is available under the viewer field scope:

Listing 7. 56. Query to test needList under viewer
{
  viewer(authToken: "testToken123") {
    id
    email
    needList {
      content
    }
  }
}

You can also make sure that needList does not work under a regular use of the User type using this query:

Listing 7. 57. Query to test needList under author
{
  needFeaturedList {
    content
    author {
      email
      needList {
        content
      }
    }
  }
}

The needList field should be reported as null.

5.2. Dynamic types

Although the needList implementation so far is aware of the scope and will only work for authorized requests, it fundamentally has one problem. When it’s not supposed to work, it should not show up at all. It should not be part of the User GraphQL type. Right now, if you expand the auto-suggest list under needFeaturedListauthor, the needList field will be there (and it should not). Let’s fix that.

The easiest fix to this is to simply use different GraphQL types for a user record under the two scopes (viewer and author). However, since these two types share most of their structure, we can extract the common parts in lib/schema/types/user.js and re-use them to create the two types: User (which is the generic type) and Viewer (which has the needList field).

Here’s what I came up with:

Listing 7. 58. Changes in lib/schema/types/user.js
import {
  GraphQLID,
  GraphQLObjectType,
  GraphQLString,
  GraphQLNonNull,
  GraphQLList,
} from 'graphql';

import Need from './need.js';

const fields = ({ viewerScope }) => {
  const userFields = {
    id: {
      type: new GraphQLNonNull(GraphQLID),
    },
    // ·-·-·
  };

  if (viewerScope) {
    userFields.needList = {
      type: new GraphQLNonNull(
        new GraphQLList(new GraphQLNonNull(Need))
      ),

      resolve: (source, args, { loaders }) => {
        if (!source.auth_token) {
          return [];
        }
        return loaders.needsForUser.load(source.auth_token);
      },
    };
  }

  return userFields;
};

const User = new GraphQLObjectType({
  name: 'User',
  fields: () => fields({ viewerScope: false }),
});

export const Viewer = new GraphQLObjectType({
  name: 'Viewer',
  fields: () => fields({ viewerScope: true }),
});

export default User;

This way the file has 2 exports. The default is the generic User type that can be used anywhere and the other is a named export (viewer) to be used under the viewer field scope.

Here’s how to use the new Viewer type:

Listing 7. 59. Changes in lib/schema/queries.js
// ·-·-·

import { Viewer } from './types/user.js';

const QueryType = new GraphQLObjectType({
  name: 'Query',
  fields: {
    // ·-·-·
    viewer: {
      type: Viewer,
      args: {
        authToken: { type: new GraphQLNonNull(GraphQLString) },
      },
      resolve: async (source, args, { loaders }) => {
        return loaders.usersByAuthTokens.load(args.authToken);
      },
    },
  },
});

With this, we are done with the query tree for this schema. We are now ready to tackle the mutation and subscription operations for the schema which, we will do in the next chapter.

6. Summary

  • To optimize data-fetching operations in a generic scalable way you can leverage the concepts of caching and batching.

  • You can cache SQL responses based on unique values like IDs or any other custom unique values you design in your API service.

  • You can delay asking the database about a certain resource until you figure out all the unique IDs of all the records needed from that resource and then send a single request to the database to include all the records based on all the IDs.

  • DataLoader is a generic JavaScript library that can be used as part of your application’s data-fetching layer to provide a simplified and consistent API over various data sources and abstract the use of batching and caching. This enables you to focus on your application’s logic and safely distribute its data-fetching requirements without worrying about maintaining minimal requests to your databases and other sources of data.

  • DataLoader instances are meant to be scoped for a single request and they can be used for both ID-based SQL statements and more complex statements like fetching lists or even full-text search.

  • If you use DataLoader in your stack you should do all database communications through it. Your GraphQL resolvers will then just delegate the task of resolving the data to the DataLoaders instances. This makes the code cleaner and more maintainable.