Name

    ARB_compute_variable_group_size

Name Strings

    GL_ARB_compute_variable_group_size

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Contributors

    Slawomir Grajewski, Intel Corporation
    Jeannot Breton, NVIDIA
    Daniel Koch, NVIDIA

Notice

    Copyright (c) 2013 The Khronos Group Inc. Copyright terms at
        http://www.khronos.org/registry/speccopyright.html

Status

    Complete. Approved by the ARB on June 3, 2013.
    Ratified by the Khronos Board of Promoters on July 19, 2013.

Version

    Last Modified Date:         May 30, 2013
    Revision:                   8

Number

    ARB Extension #153

Dependencies

    This extension is written against the OpenGL 4.3 (Compatibility Profile)
    Specification, dated August 6, 2012.

    This extension is written against the OpenGL Shading Language
    Specification, Version 4.30, Revision 7, dated September 24, 2012.

    OpenGL 4.3 or ARB_compute_shader is required.

    This extension interacts with NV_compute_program5.

Overview

    This extension allows applications to write generic compute shaders that
    operate on work groups with arbitrary dimensions.  Instead of specifying a
    fixed work group size in the compute shader, an application can use a
    compute shader using the /local_size_variable/ layout qualifer to indicate
    a variable work group size.  When using such compute shaders, the new
    command DispatchComputeGroupSizeARB should be used to specify both a work
    group size and work group count.

    In this extension, compute shaders with fixed group sizes must be
    dispatched by DispatchCompute and DispatchComputeIndirect.  Compute
    shaders with variable group sizes must be dispatched via
    DispatchComputeGroupSizeARB.  No support is provided in this extension for
    indirect dispatch of compute shaders with a variable group size.

New Procedures and Functions

    void DispatchComputeGroupSizeARB(uint num_groups_x, uint num_groups_y,
                                     uint num_groups_z, uint group_size_x,
                                     uint group_size_y, uint group_size_z);

New Tokens

    Accepted by the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv,
    GetDoublev and GetInteger64v:

        MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB      0x9344
        MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB         0x90EB (see note)

    Accepted by the <pname> parameter of GetIntegeri_v, GetBooleani_v,
    GetFloati_v, GetDoublei_v and GetInteger64i_v:

        MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB             0x9345
        MAX_COMPUTE_FIXED_GROUP_SIZE_ARB                0x91BF (see note)

    Note:  MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB and
    MAX_COMPUTE_FIXED_GROUP_SIZE_ARB are aliases for the OpenGL 4.3 core enums
    MAX_COMPUTE_WORK_GROUP_INVOCATIONS and MAX_COMPUTE_WORK_GROUP_SIZE,
    respectively.


Modifications to the OpenGL 4.3 (Compatibility Profile) Specification

    Modify Chapter 19, Compute Shaders, p. 585

    (modify second paragraph, p. 585)

    ... One or more work groups is launched by calling

      void DispatchCompute(uint num_groups_x, uint num_groups_y,
                           uint num_groups_z)

    or

      void DispatchComputeGroupSizeARB(uint num_groups_x, uint num_groups_y,
                                       uint num_groups_z, uint group_size_x,
                                       uint group_size_y, uint group_size_z);

    (modify second paragraph, p. 586)

    For DispatchCompute, the local work size in each dimension must be
    specified at compile time in the active program for the compute shader
    stage.  The local work size is specified using an input layout qualifer
    ...

    (insert after second paragraph, p. 586)

    For DispatchComputeGroupSizeARB, the local work size must be specified as
    variable in the active program for the compute shader stage.  The group
    size used to execute the compute shader is taken from the <group_size_x>,
    <group_size_y>, and <group_size_z> parameters.  For the purposes of the
    COMPUTE_WORK_GROUP_SIZE query, a program without a local work size
    specified at compile time will be considered to have a size of zero in
    each dimension.

    (modify the third paragraph, p. 586)

    The maximum size of a local work group may be determined by calling
    GetIntegeri_v with <index> set to 0, 1, or 2 to retrieve the maximum work
    size in the X, Y and Z dimension, respectively.  <target> should be set to
    MAX_COMPUTE_FIXED_GROUP_SIZE_ARB for compute shaders with fixed group
    sizes or MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB for compute shaders with
    variable local group sizes.  Furthermore, the maximum number of
    invocations in a single local work group (i.e., the product of the three
    dimensions) may be determined by calling GetIntegerv with <pname> set to
    MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB for compute shaders with fixed
    group sizes or MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB for compute
    shaders with variable group sizes.

    (insert after the first INVALID_OPERATION error in the first error block,
     shared between DispatchCompute and DispatchComputeGroupSizeARB, p. 586)

    An INVALID_OPERATION error is generated by DispatchCompute if the active
    program for the compute shader stage has a variable work group
    size.

    An INVALID_OPERATION error is generated by DispatchComputeGroupSizeARB if
    the active program for the compute shader stage has a fixed work group
    size.

    (insert at the end of the first error block, shared between
     DispatchCompute and DispatchComputeGroupSizeARB, p. 586)

    An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if any
    of <group_size_x>, <group_size_y>, or <group_size_z> is less than or equal
    to zero or greater than the maximum local work group size for compute
    shaders with variable group size (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in
    the corresponding dimension.

    An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if the
    product of <group_size_x>, <group_size_y>, and <group_size_z> exceeds the
    implementation-dependent maximum local work group invocation count for
    compute shaders with variable group size
    (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB).

    (insert at the end of the first error block, for DispatchComputeIndirect,
     p. 587)

    An INVALID_OPERATION error is generated if the active program for the
    compute shader stage has a variable work group size.


Modifications to the OpenGL Shading Language Specification, Version 4.30

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_ARB_compute_variable_group_size : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_ARB_compute_variable_group_size        1


    Modify Section 4.4.1.4, Compute Shader Inputs (p. 59)

    (add to list of layout qualifiers for compute shader inputs, p. 59)

      layout-qualifier-id
        local_size_x = integer-constant
        local_size_y = integer-constant
        local_size_z = integer-constant
        local_size_variable

    (modify the last paragraph, p. 59)

    The local_size_x, local_size_y, and local_size_z qualifiers are used to
    declare a fixed local group size for the kernel in the first, second...

    (modify the second to last paragaph in the section)

    If the fixed local group size of the shader in any dimension...
    ... If multiple compute shaders attached to a single program object declare
    a fixed local group size, the declarations must be identical; otherwise a
    link-time error results.

    (insert before the last paragraph of the section, p. 60)

    The *local_size_variable* qualifier is used to declare that
    the local group size of the shader is variable, and will be specified
    using arguments to OpenGL API compute dispatch commands.  If a compute
    shader including a *local_size_variable* qualifier also declares a
    fixed local group size using the *local_size_x*, *local_size_y*, or
    *local_size_z* qualifiers, a compile-time error results.  If one compute
    shader attached to a program declares a variable local group size and a
    second compute shader attached to the same program declares a fixed
    local group size, a link-time error results.

    (modify last paragraph of the section, p. 60, which specified link errors
     if *local_size* layout qualifiers were omitted)

    Furthermore, if a program object contains any compute shaders, at least
    one must contain an input layout qualifier specifying a fixed or variable
    local group size for the program, or a link-time error will occur.


    Modify Section 7.1, Built-In Language Variables, p. 110

    (add to list of compute built-ins, p. 110)

      in    uvec3 gl_NumWorkGroups;     // already exists in 4.30
      const uvec3 gl_WorkGroupSize;     // already exists in 4.30
      in    uvec3 gl_LocalGroupSizeARB; // new!

    (modify third paragraph, p. 113)

    The built-in constant gl_WorkGroupSize is a compute-shader constant ...
    It is a compile-time error to use gl_WorkGroupSize in a shader that does
    not declare a fixed local group size, or before that shader has declared
    a fixed local group size, using local_size_x, local_size_y, and
    local_size_z.   ...

    (insert after third paragraph, p. 113)

    The built-in variable /gl_LocalGroupSizeARB/ is a compute-shader input
    variable containing the local work group size for the current compute-
    shader work group.  For compute shaders with a fixed local group size (using
    *local_size_x*, *local_size_y*, or *local_size_z* layout qualifiers), its
    value will be the same as the constant /gl_WorkGroupSize/.  For compute
    shaders with a variable local group size (using *local_size_variable*),
    the value of /gl_LocalGroupSizeARB/ will be the work
    group size specified in the OpenGL API command dispatching the current
    compute shader work.

    (modify next-to-last paragraph, p. 113)

    The built-in variable gl_LocalInvocationID ...  The possible values for
    this varaible range across the local work group size, i.e., (0,0,0) to
    (gl_LocalGroupSizeARB.x - 1, gl_LocalGroupSizeARB.y - 1,
    gl_LocalGroupSizeARB.z - 1).

    (modify last paragraph, p. 113)

    The built-in variable gl_GlobalInvocationID ...  This is computed as:

      gl_GlobalInvocationID = gl_WorkGroupID * gl_LocalGroupSizeARB +
                              gl_LocalInvocationID;


    (modify first paragraph, p. 114)

    The built-in variable gl_LocalInvocationIndex ...  This is computed as:

      gl_LocalInvocationIndex =
        gl_LocalInvocationID.z * (gl_LocalGroupSizeARB.x *
                                  gl_LocalGroupSizeARB.y) +
        gl_LocalInvocationID.y * gl_LocalGroupSizeARB.x +
        gl_LocalInvocationID.x;


Additions to the AGL/EGL/GLX/WGL Specifications

    None

GLX Protocol

    TBD

Dependencies on NV_compute_program5

    If NV_compute_program5 is supported, variable work group sizes are
    supported for assembly programs.  Make the following edits to the
    NV_compute_program5 specification:

    (modify the NV_compute_program5 edits to Section 2.X.3.2, Program
     Attribute Variables)

    If a compute attribute binding matches "invocation.groupsize", the "x",
    "y", and "z" components of the invocation attribute variable are filled
    the "x", "y", and "z" dimensions, respectively, of the local work group,
    as specified by the GROUP_SIZE declaration for programs with fixed-size
    work groups or through the OpenGL API for programs with variable-size work
    groups.  The "w" component of the attribute is undefined.

    (add to section 2.X.6 of the NV_gpu_program4/5 spec, Program Options)

    + Compute Shader Variable Group Size (ARB_compute_variable_group_size)

    If a program specifies the "ARB_compute_variable_group_size" option, it
    supports variable-size work groups.  Compute programs with a variable work
    group size must be dispatched with DispatchComputeGroupSizeARB.  Compute
    programs with a fixed work group size must be dispatched with
    DispatchCompute or DispatchComputeIndirect.

    (modify Section 2.X.7.Y, Compute Program Declarations)

    - Shader Thread Group Size (GROUP_SIZE)

    The GROUP_SIZE statement declares the number of shader threads in a one-,
    two-, or three-dimensional local work group.  The statement must have one
    to three unsigned integer arguments.  Each argument must be less than or
    equal to the value of the implementation-dependent limit
    MAX_COMPUTE_LOCAL_WORK_SIZE for its corresponding dimension (X, Y, or Z).
    If the ARB_compute_variable_group_size option is specified, no fixed group
    size should be specified and a program will fail to load if it includes
    any GROUP_SIZE declaration.  If the ARB_compute_variable_group_size option
    is not specified, a program will fail to load unless it contains exactly
    one GROUP_SIZE declaration.

Errors

    An INVALID_OPERATION error is generated by DispatchCompute or
    DispatchComputeIndirect if the active program for the compute shader stage
    has a variable work group size.

    An INVALID_OPERATION error is generated by DispatchComputeGroupSizeARB if
    the active program for the compute shader stage has a fixed work group
    size.

    An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if any
    of <group_size_x>, <group_size_y>, or <group_size_z> is less than or equal
    to zero or greater than the maximum local work group size for compute
    shaders with variable group size (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in
    the corresponding dimension.

    An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if the
    product of <group_size_x>, <group_size_y>, and <group_size_z> exceeds the
    implementation-dependent maximum local work group invocation count for
    compute shaders with variable group size
    (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB).

New State

    None.

New Implementation Dependent State

    Add to Table 23.73 (Implementation Dependent Compute Shader Limits),
    p. 716

                                                    Minimum
    Get Value                  Type  Get Command     Value     Description                    Sec.
    -------------------------  ----  -------------  ---------  ----------------------------   ------
    MAX_COMPUTE_VARIABLE_      3xZ+  GetIntegeri_v  512 (x,y)  maximum local group size for   19
      WORK_GROUP_SIZE_ARB                           64 (z)     compute shaders with variable
                                                               group size (per dimension)
    MAX_COMPUTE_VARIABLE_      Z+    GetIntegerv    512        maximum number of invocations  19
      WORK_GROUP_                                              in a group for compute shaders
      INVOCATIONS_ARB                                          with variable group size

    In table 23.73, rename entries for "MAX_COMPUTE_WORK_GROUP_SIZE" and
    "MAX_COMPUTE_WORK_GROUP_INVOCATIONS" to use the labels
    "MAX_COMPUTE_FIXED_GROUP_SIZE_ARB" and
    "MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB", respectively.  Also modify the
    description of these entries to refer to "compute shaders with fixed group
    size".

Issues

    (1) If a compute shader declares a work group size, can it be dispatched
        using OpenGL APIs accepting an explicit work group size as part of the
        command?  If so, what happens?

      RESOLVED:  No.  Attempting to do so will generate an INVALID_OPERATION
      error.

      Since the fixed work group size may affect the compilation of the shader
      and the value of certain built-in constants, having the OpenGL API
      override the work group size baked into the compute shader seems
      suspect.  We could conceivably allow an explicit work group size in the
      OpenGL API and require that it match the work group size baked into the
      compute shader, but doing so seems to be of limited value.

    (2) If a compute shader doesn't declare a work group size, can it be
        dispatched using OpenGL APIs that do not accept an explicit work group
        size as part of the command?  If so, what happens?

      RESOLVED:  No.  Attempting to do so will generate an INVALID_OPERATION
      error.

      We could theoretically treat this case as allowing OpenGL
      implementations to pick a work group size that "works well" on a
      particular piece of hardware.  However, that wouldn't resolve the
      question of what the "num_groups" arguments to DispatchCompute would
      mean if the group size were implementation-dependent.  One could
      intepret the "num_groups" arguments as specifying the number of
      *invocations* in each dimension, as though the group size were 1x1x1.
      But it's just easier to make this condition an error, as we do for APIs
      attempting to override the group size of a compute shader.

    (3) What new GLSL built-ins should we provide to expose the group size
        specified in the OpenGL API?

      RESOLVED:  We will provide a new built-in variable exposing the group
      size specified in the API.  The name choice is potentially tricky, since
      we now have two different "work group size" variables -- a previously
      existing constant for the fixed work group size and now a second input
      for the variable work group size specified in the API.  We choose the
      name "gl_LocalGroupSizeARB" here, which seems to fit reasonably well with
      existing inputs such as "gl_LocalInvocationID".

      If we had provided this functionality in the original compute shader
      extension, maybe we could have only had "gl_LocalGroupSizeARB"?
      However, the constant "gl_WorkGroupSize" would still be useful for
      sizing built-in arrays for shaders with a fixed work group size.  For
      example, a shader might want to declare a shared variable with one
      instance per work group invocation, such as:

        shared float shared_values[gl_WorkGroupSize.x * gl_WorkGroupSize.y *
                                   gl_WorkGroupSize.z];

      Such declarations would be illegal using the input
      "gl_LocalGroupSizeARB".

    (4) Do we need to modify the behavior of existing GLSL built-ins for
        compute shaders without an explicit work group size?

      RESOLVED:  No, not really.

      The constant gl_WorkGroupSize seems like it would be affected by
      omitting an explicit work group size.  However, it is already an error
      to use gl_WorkGroupSize in a shader before a work group size layout
      qualifier is declared.  That would make its use illegal in shaders where
      work group size layout qualifiers are not declared at all.

      We do need to make minor modifications to the language describing other
      built-in inputs such as gl_LocalInvocationIndex, that are today defined
      to be a function of the constant gl_WorkGroupSize.  We modify these
      definitions to use the input gl_LocalGroupSizeARB instead.

    (5) Should we provide a function (e.g.,
        DispatchComputeIndirectGroupSizeARB) that takes both a work group
        count and a work group size from indirect dispatch buffers?  If so,
        what do we do if the work group size is not positive or exceeds
        implementation-dependent limits?

      RESOLVED:  No, let's leave this out of this extension.

    (6) Is it necessary for compute shaders to include a "#extension"
        directive to enable this extension in order to link successfully
        without a fixed work group size?

      RESOLVED:  Yes, compute shaders will have to use the
      "local_size_variable" layout qualifier to declare a variable work group
      size, and an "#extension" directive is required to be able to use that
      layout qualifier.

      In unextended OpenGL 4.3, we get a link error if no shaders in the
      program exercise an existing language feature (declaring the fixed work
      group size).  We could have simply removed this error, but the general
      rule for "#extension" is that a user should be able to determine if a
      shader were legal or not simply by examining the source code.

      Note that it is necessary to use "#extension" to use the new built-in
      input (gl_LocalGroupSizeARB) provided by this extension.

    (7) Do we need different implementation-dependent limits for dynamic group
    sizes?

      RESOLVED:  Yes, some implementations of this extension may require lower
      limits for variable local group sizes.  We add new tokens
      MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB and
      MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB to query these limits.
      Implementations must support variable group dimensions of 512/512/64,
      with at least 512 invocations per group.  The minimum limits for fixed
      group sizes in unextended OpenGL 4.3 are 1024/1024/64 with at least 1024
      invocations per group.

    (8) Do we need an explicit query to determine if a program with a compute
    shader has a fixed or variable local group size?

      RESOLVED:  No.  The existing COMPUTE_WORK_GROUP_SIZE query will return
      zero when using a shader with a variable local group size, and will
      always return non-zero values for shaders with a fixed group size.

Revision History

    Revision 8, May 30, 2013 (pbrown)
      - Fix a typo in the MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB description;
        that limit applies only to shaders with variable group sizes.

    Revision 7, May 30, 2013 (pbrown)
      - Mark issue (8) as resolved.

    Revision 6, May 12, 2013 (JohnK)
      - Editorial things:
         - be more consistent/broader with "fixed local group size" language
           (vs. variable), and related, also bringing in another paragraph from
           the core spec.
         - move spec. more toward using bold layout qualifier ids everywhere
         - few minor typos, other tiny changes

    Revision 5, May 8, 2013
      - Assign enum values for new tokens.
      - Add interaction with NV_compute_program5 assembly programs.

    Revision 4, May 7, 2013
      - Add new implementation limits MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB and
        MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB for compute shaders with
        variable group sizes, with minimum values of 512/512/64 and 512,
        respectively.
      - Add new tokens MAX_COMPUTE_FIXED_GROUP_SIZE_ARB and
        MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB for compute shaders with fixed
        group sizes, which are aliased to existing OpenGL 4.3 tokens
        (MAX_COMPUTE_WORK_GROUP_SIZE and MAX_COMPUTE_WORK_GROUP_INVOCATIONS).

    Revision 3, May 4, 2013
      - Add ARB suffixes for the new entry point (DispatchComputeGroupSizeARB)
        and GLSL built-in variable (gl_LocalGroupSizeARB).
      - Add a missing INVALID_OPERATION error to DispatchComputeIndirect,
        which requires a compute shader with a variable local group size.
      - Add new issue (8) about querying if a program with a compute shader
        has a fixed or variable group size.

    Revision 2, May 3, 2013
      - Modify the spec to accept an explicit layout qualifer
        /local_size_variable/ to specify a compute shader with a variable
        local group size instead of inferring it from the lack of fixed-size
        layout qualifiers.
      - Modify some spec language to refer to the existing and new types of
        compute shaders as having a fixed and variable local group size,
        respectively.
      - Mark various issues as resolved based on work group discussions.
      - Add new issue (7) about different implementation-dependent size limits
        for compute shaders with variable-size local work groups.

    Revision 1, January 20, 2013
      - Initial revision.
