Data Structure in JavaScript
At a high level, there are basically three types of data structures.
Stacks and Queues are array-like structures that differ only in how items are inserted and removed.
Linked Lists, Trees, and Graphs are structures with nodes that keep references to other nodes.
Hash Tables depend on hash functions to save and locate data.
In terms of complexity,
Stacks
and Queues
are the simplest and can be constructed from Linked Lists
. Trees
and Graphs
are the most complex because they extend the concept of a linked list. Hash Tables
need to utilize these data structures to perform reliably.
In terms of efficiency,
Linked Lists are most optimal for recording and storing of data,
while Hash Tables are most performant for searching and retrieving of data.
Stack
Arguably the most important
Stack
in JavaScript is the call stack where we push in the scope of a function
whenever we execute it. Programmatically, it’s just an array
with two principled operations: push
and pop
. Push addselements to the top of the array, while Pop removes them from the same location. In other words, Stacks follow the “Last In, First Out” protocol (LIFO).
Below is an example of a
Stack
in code. Notice that we can reverse the order of the stack: the bottom becomes the top and the top becomes the bottom. As such, we can use the array’s unshift
and shift
methods in place of push
and pop
, respectively.Queue
JavaScript is an event-driven programming language which makes it possible to support non-blocking operations. Internally, the browser manages only one thread to run the entire JavaScript code, using the event queue to enqueue
listeners
and the event loop to listen for the registered events
. To support asynchronicity in a single-threaded environment (to save CPU resources and enhance the web experience), listener functions
dequeue and execute only when the call stack is empty. Promises
depend on this event-driven architecture to allow a “synchronous-style” execution of asynchronous code that does not block other operations.
Programmatically,
Queues
are just arrays with two primary operations: unshift
and pop
. Unshift enqueues items to the end of the array, while Pop dequeues them from the beginning of the array. In other words, Queues follow the “First In, First Out” protocol (FIFO). If the direction is reversed, we can replace unshift
and pop
with push
and shift
, respectively.Linked List
Like arrays,
Linked Lists
store data elements in sequential order. Instead of keeping indexes, linked lists hold pointers to other elements. The first node is called the head while the last node is called the tail. In a singly-linked list, each node has only one pointer to the next node. Here, the head is where we begin our walk down the rest of the list. In a doubly-linked list, a pointer to the previous node is also kept. Therefore, we can also start from the tail and walk “backwards” toward the head.
Linked lists have constant-time insertions and deletions because we can just change the pointers. To do the same operations in arrays requires linear timebecause subsequent items need to be shifted over. Also, linked lists can grow as long as there is space. However, even “dynamic” arrays that automatically resize could become unexpectedly expensive. Of course, there’s always a tradeoff. To lookup or edit an element in a linked list, we might have to walk the entire length which equates to linear time. With array indexes, however, such operations are trivial.
Like arrays, linked lists can operate as stacks. It’s as simple as having the head be the only place for insertion and removal. Linked lists can also operate as queues. This can be achieved with a doubly-linked list, where insertion occurs at the tail and removal occurs at the head, or vice versa.
Linked lists are useful on both the client and server. On the client, state management libraries like Redux structure its middleware logic in a linked-list fashion. When actions are dispatched, they are piped from one middleware to the next until all is visited before reaching the reducers. On the server, web frameworks like Express also structure its middleware logic in a similar fashion. When a request is received, it is piped from one middleware to the next until a response is issued.
Tree
A
Tree
is like a linked list, except it keeps references to many child nodes in a hierarchical structure. In other words, each node can have no more than one parent. The Document Object Model (DOM) is such a structure, with a root html
node that branches into the head
and body
nodes, which further subdivide into all the familiar html tags. Under the hood, prototypal inheritance and composition with React components also produce tree structures. Of course, as an in-memory representation of the DOM, React’s Virtual DOM is also a tree structure.
The Binary Search Tree is special because each node can have no more than two children. The left child must have a value that is smaller than or equal to its parent, while the right child must have a greater value. Structured and balanced in this way, we can search for any value in logarithmic time because we can ignore one-half of the branching with each iteration. Inserting and deleting can also happen in logarithmic time. Moreover, the smallest and largest value can easily be found at the leftmost and rightmost leaf, respectively.
Traversal through the tree can happen in a vertical or horizontal procedure. In Depth-First Traversal (DFT) in the vertical direction, a recursive algorithm is more elegant than an iterative one. Nodes can be traversed in pre-order, in-order, or post-order. If we need to explore the roots before inspecting the leaves, we should choose pre-order. But, if we need to explore the leaves before the roots, we should choose post-order. As its name implies, in-order enables us to traverse the nodes in sequential order. This property makes Binary Search Trees optimal for sorting.
In Breadth-First Traversal (BFT) in the horizontal direction, an iterative approach is more elegant than a recursive one. This requires the use of a
queue
to track all the children nodes with each iteration. The memory needed for such a queue might not be trivial, however. If the shape of a tree is wider than deep, BFT is a better choice than DFT. Also, the path that BFT takes between any two nodes is the shortest one possible.Graph
If a tree is free to have more than one parent, it becomes a
Graph
. Edges that connect nodes together in a graph can be directed or undirected, weighted or unweighted. Edges that have both direction and weight are analogous to vectors.
Multiple inheritances in the form of Mixins and data objects that have many-to-many relationships produce graph structures. A social network and the Internet itself are also graphs. The most complicated graph in nature is our human brain, which neural networks attempt to replicate to give machines super intelligence.
Hash Table
A Hash Table is a dictionary-like structure that pairs keys to values. The location in memory of each pair is determined by a
hash function
, which accepts a key and returns the address where the pair should be inserted and retrieved. Collisions can result if two or more keys convert to the same address. For robustness, getters
and setters
should anticipate these events to ensure that all data can be recovered and no data is overwritten. Usually, linked lists
offer the simplest solution. Having very large tables also helps.
If we know our addresses will be in integer sequences, we can simply use
Arrays
to store our key-value pairs. For more complex address mappings, we can use Maps
or Objects
. Hash tables have insertion and lookup of constanttime on average. Because of collisions and resizing, this negligible cost could grow to linear time. In practice, however, we can assume that hash functions are clever enough that collisions and resizing are rare and cheap. If keys represent addresses, and therefore no hashing is needed, a simple object literal
can suffice. Of course, there’s always a tradeoff. The simple correspondence between keys and values, and the simple associativity between keys and addresses, sacrifice relationships between data. Thus, hash tables are suboptimal for storing data.
If a tradeoff decision favors retrieval over storage, no other data structure can match the speed of hash tables for lookup, insertion, and deletion. It’s no surprise, therefore, that it’s used everywhere. From the database, to the server, to the client, hash tables, and in particular, hash functions, are crucial to the performance and security of software applications. The speed of database queries relies heavily upon keeping tables of indexes that point to records in sorted order. This way, binary searches can be performed in logarithmic time, a huge performance win especially for Big Data.
On both the client and server, many popular libraries utilize memoization to maximize performance. By keeping a record of the inputs and outputs in a hash table, functions run only once for the same inputs. The popular Reselect library uses this caching strategy to optimize
mapStateToProps
functions in Redux-enabled applications. In fact, under the hood, the JavaScript engine also utilizes hash tables called heaps to store all the variables
and primitives
we create. They are accessed from pointers on the call stack.
The Internet itself also relies on hashing algorithms to function securely. The structure of the internet is such that any computer can communicate with any other computer through a web of interconnected devices. Whenever a device logs onto the internet, it also becomes a router through which data streams can travel. However, it’s a double-edged sword. A decentralized architecture means any device in the network can listen in and tamper with the data packages that it helps to relay. Hash functions such as MD5 and SHA256 play a critical role in preventing such man-in-the-middle attacks. E-commerce over HTTPS is safe only because these hashing functions are used.
Inspired by the Internet, blockchain technologies seek to open source the very structure of the web at the protocol level. By using hash functions to create immutable fingerprints for every block of data, essentially the entire database can exist openly on the web for anyone to see and contribute to. Structurally, blockchains are just singly-linked lists of binary trees of cryptographic hashes. The hashing is so cryptic that a database of financial transactions can be created and updated out in the open by anyone! The incredible implication is the awesome power to create money itself. What was once only possible for governments and central banks, now anyone can securely create his or her own currency! This is the key insight realized by the founder of Ethereum and the pseudonymous founder of Bitcoin.
As more and more databases move out into the open, the need for Frontend Engineers who can abstract away all the low-level cryptographic complexities will compound as well. In this future, the main differentiator will be the user experience.