Review Architecture/Scale Plan

Cadence Yearly

Sites production

	Initial	Recurring
Estimated Time	2-3 hrs	1 hr

Benefits:

Provide documentation
Increase architectural knowledge
Plan for future deployments

Goal

The goal is to have documented Qlik architecture diagrams of n and n+1 deployments*, as well as an understanding of high-level architectural concepts within Qlik.

* n refers to the current deployment, while n+1 refers to an anticipated state of the next deployment.

This is integral for:

understanding the deployment and how it is laid out
planning for growth
articulating change
understanding resiliency & availability
having documentation available to others

Supporting Documentation
Building an Architecture Diagram
Planning for N+1 Architectures
- High-level Scaling Concepts
- Review Capacity Plan

Supporting Documentation

Please take the time to review the below if unfamiliar before continuing on with this page.

Building an Architecture Diagram

Core Requirements

An editor. This could be Visio, PowerPoint, or web editors like Gliffy and Draw.io.
A set of base icons or symbols
- If the deployment is on-premises
  - a server
  - a database
  - a file share
  - a network load balancer
- If the deployment is in the cloud:
  - AWS icons can be found here
  - Azure icons can be found here
  - GCP icons can be found here
Knowledge of what Qlik services are active on what nodes
Knowledge of what each Qlik node is being used for
Knowledge of where the Qlik file share is and the Qlik repository database is

Nice to Haves

Server names and aliases
Any network load balancers/interfaces in front of Qlik
Any firewall settings pertinent to Qlik

On-Premises Diagram Example

Growth Environment

Enterprise Environment

AWS Diagram Example

Enterprise Environment

Note

Please note that these cloud diagrams are intended for Qlik admins and occasionaly used to translate needs to supporting LOBs like IT. The examples below do not conform to the individual cloud vendor architectural diagram standards, as these aren’t designed to be consumed by cloud engineers/network admins, etc. If one would like to include VPCs, AZs, SGs, Network ACLs, all the better – however it goes beyond the basics of this exercise.

Note

SMB file share - either FSx (requires domain) or EBS storage on EC2 instance.

Note

One could also leverage the Growth Environment in AWS, with both the repository database and fileshare on an AWS EC2 instance.

Azure Diagram Example

Enterprise Environment

Note

Please note that these cloud diagrams are intended for Qlik admins and occasionally used to translate needs to supporting LOBs like IT. The examples below do not conform to the individual cloud vendor architectural diagram standards, as these aren’t designed to be consumed by cloud engineers/network admins, etc. If one would like to include Virtual Networks, Subnets, Resource Groups, etc, all the better – however it goes beyond the basics of this exercise.

Note

Azure Files for SMB Storage with Cmdkey or Windows Credential Manager.

Note

One could also leverage the Growth Environment in Azure, with both the repository database and fileshare on an Azure VM.

Planning for N+1 Architectures

In order to plan for an upcoming architectural event, it is imperative to have an understanding of the varying methods of scaling a site, as well an awareness of any architectural impacts the current Capacity Plan might have.

High-level Scaling Concepts

Broadly speaking, there are two primary scaling methodologies – however, do note that these are not mutually exclusive:

Horizontal Scaling
- Adding additional nodes/services, providing a wide, resilient topology.
Vertical Scaling
- Expanding current server footprints, i.e. adding additional cores/RAM.

Horizontal scaling is typically common if a Qlik environment has small to medium sized applications with many users. Meaning, applications can be loaded quickly onto many different engines with little delay, and calculations are fast – meaning that a shared cache isn’t necessarily as integral for these applications. This methodology is also common in virtual environments on-premises where VM sizes may be restricted. For instance, if an organization caps VM sizes at 96 or 128 GB of RAM, more than likely that Qlik environment will end up with a wider footprint, and will adopt practices to allow their applications to fit it.

Vertical scaling is typically common where the user base is not extensive, and the applications are quite large. Less nodes with larger capacity allows for larger applications with more users taking advantage of the same cache. These applications are usually cache warmed so that they are readily available for users without delay.

Both of these methodologies are frequently combined when an organization has a mix of both very large apps and smaller apps with a wide user pool. It is usually common for organizations to have “small - medium app engines” and “large app engines” – for example, maybe four of the former and two of the latter. Leveraging load balancing rules (as described in Review Pinning/Load Balancing), large applications are “pinned” to the larger nodes, and vice versa.

Review Capacity Plan

In order to plan for the next architectural event, one must first review the current Capacity Plan.

Common questions that would have impact:

Is there a significant license growth event that would mandate additional proxy/engine nodes?
In general across end-user engine nodes, are the CPU/RAM metrics in a healthy state consistently? If not, this might warrant the need for vertical growth or app optimization.
Are there intra-day reloads running on end-user nodes that are affecting performance, therefore the end-user experience? That could warrant offloading them to dedicated schedulers (this is highly encouraged and preferred).
Are there applications that are being considered for “application pinning” to specific engine nodes via load balancing rules? Could application optimization bring these applications down in size to avoid that, or are they simply monolithic by nature and need to be pinned? Are there enough engine nodes currently to support the segregation of assets while providing resiliency (2+ nodes for each), or do more engine nodes need to be added? Is vertical growth required to support these large applications on less nodes, given the fact that there will be more user caching on less nodes potentially?
Is horizontal growth preferred, or is vertical growth preferred, or is there a business event driving one or the other? Is a mix of both possible? This will involve discussions with IT to see what is possible.
Is something outside the Qlik deployment driving an architectural event, e.g. there is money to be spent on infrastructure now, though a license event might not occur for another 6 months? This will involve speaking with the business to see what types of applications/use cases are in the pipeline to see what infrastructure should support future needs.
Is ODAG in play or going to be in play? Should these application reloads happen on a dedicated scheduler?
Is Qlik NPrinting or Qlik InsightBot on the horizon or in play? Should these run against dedicated engines?

These are all questions that should be considered while planning for the next-state architecture.

Tags

#yearly

#system_planning

#architecture

#scale

Review Architecture/Scale Plan

Goal

Table of Contents

Supporting Documentation

Architecture 101 (Components, Terminology)

Load Balancing Concepts

Resiliency & HA

Example Production Architectures

Building an Architecture Diagram

Core Requirements

On-Premises Diagram Example

AWS Diagram Example

Azure Diagram Example

Planning for N+1 Architectures

High-level Scaling Concepts

Review Capacity Plan