Name

    ARB_gpu_shader5

Name Strings

    GL_ARB_gpu_shader5

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Contributors

    Barthold Lichtenbelt, NVIDIA
    Bill Licea-Kane, AMD
    Bruce Merry, ARM
    Chris Dodd, NVIDIA
    Eric Werness, NVIDIA
    Graham Sellers, AMD
    Greg Roth, NVIDIA
    Jeff Bolz, NVIDIA
    Nick Haemel, AMD
    Pierre Boudier, AMD
    Piers Daniell, NVIDIA

Notice

    Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
        http://www.khronos.org/registry/speccopyright.html

Status

    Complete. Approved by the ARB at the 2010/01/22 F2F meeting.
    Approved by the Khronos Board of Promoters on March 10, 2010.

Version

    Version 16, March 30, 2012

Number

    ARB Extension #88

Dependencies

    This extension is written against the OpenGL 3.2 (Compatibility Profile)
    Specification.

    This extension is written against Version 1.50 (Revision 09) of the OpenGL
    Shading Language Specification.

    OpenGL 3.2 and GLSL 1.50 are required.

    This extension interacts with ARB_gpu_shader_fp64.

    This extension interacts with NV_gpu_shader5.

    This extension interacts with ARB_sample_shading.

    This extension interacts with ARB_texture_gather.

Overview

    This extension provides a set of new features to the OpenGL Shading
    Language and related APIs to support capabilities of new GPUs, extending
    the capabilities of version 1.50 of the OpenGL Shading Language.  Shaders
    using the new functionality provided by this extension should enable this
    functionality via the construct

      #extension GL_ARB_gpu_shader5 : require     (or enable)

    This extension provides a variety of new features for all shader types,
    including:

      * support for indexing into arrays of samplers using non-constant
        indices, as long as the index doesn't diverge if multiple shader
        invocations are run in lockstep;

      * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow
        shaders to index into an array of uniform blocks;

      * support for implicitly converting signed integer types to unsigned
        types, as well as more general implicit conversion and function
        overloading infrastructure to support new data types introduced by
        other extensions;

      * a "precise" qualifier allowing computations to be carried out exactly
        as specified in the shader source to avoid optimization-induced
        invariance issues (which might cause cracking in tessellation);

      * new built-in functions supporting:

        * fused floating-point multiply-add operations;

        * splitting a floating-point number into a significand and exponent
          (frexp), or building a floating-point number from a significand and
          exponent (ldexp);

        * integer bitfield manipulation, including functions to find the
          position of the most or least significant set bit, count the number
          of one bits, and bitfield insertion, extraction, and reversal;

        * packing and unpacking vectors of small fixed-point data types into a
          larger scalar; and

        * convert floating-point values to or from their integer bit
          encodings;

      * extending the textureGather() built-in functions provided by
        ARB_texture_gather:

        * allowing shaders to select any single component of a multi-component
          texture to produce the gathered 2x2 footprint;

        * allowing shaders to perform a per-sample depth comparison when
          gathering the 2x2 footprint using for shadow sampler types;

        * allowing shaders to use arbitrary offsets computed at run-time to
          select a 2x2 footprint to gather from; and

        * allowing shaders to use separate independent offsets for each of the
          four texels returned, instead of requiring a fixed 2x2 footprint.

    This extension also provides some new capabilities for individual
    shader types, including:

      * support for instanced geometry shaders, where a geometry shader may be
        run multiple times for each primitive, including a built-in
        gl_InvocationID to identify the invocation number;

      * support for emitting vertices in a geometry program where each vertex
        emitted may be directed independently at a specified vertex stream (as
        provided by ARB_transform_feedback3), and where each shader output is
        associated with a stream;

      * support for reading a mask of covered samples in a fragment shader;
        and

      * support for interpolating a fragment shader input at a programmable
        offset relative to the pixel center, a programmable sample number, or
        at the centroid.

IP Status

    No known IP claims.

New Procedures and Functions

    None

New Tokens

    Accepted by the <pname> parameter of GetProgramiv:

        GEOMETRY_SHADER_INVOCATIONS                     0x887F

    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
    GetDoublev, and GetInteger64v:

        MAX_GEOMETRY_SHADER_INVOCATIONS                 0x8E5A
        MIN_FRAGMENT_INTERPOLATION_OFFSET               0x8E5B
        MAX_FRAGMENT_INTERPOLATION_OFFSET               0x8E5C
        FRAGMENT_INTERPOLATION_OFFSET_BITS              0x8E5D
        MAX_VERTEX_STREAMS                              0x8E71

    (note:  MAX_GEOMETRY_SHADER_INVOCATIONS,
     MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and
     FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding
     "NV" enums from NV_gpu_program5.  MAX_VERTEX_STREAMS is also defined in
     ARB_transform_feedback3.)


Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)

    Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121

    (add two unnumbered subsections after "Texture Access", p. 122)

    Instanced Geometry Shaders

    For each input primitive received by the geometry shader pipeline stage,
    the geometry shader may be run once or multiple times.  The number of
    times a geometry shader should be executed for each input primitive may be
    specified using a layout qualifier in a geometry shader of a linked
    program.  If the invocation count is not specified in any layout
    qualifier, the invocation count will be one.

    Each separate geometry shader invocation is assigned a unique invocation
    number.  For a geometry shader with <N> invocations, each input primitive
    spawns <N> invocations, numbered 0 through <N>-1.  The built-in uniform
    gl_InvocationID may be used by a geometry shader invocation to determine
    its invocation number.

    When executing instanced geometry shaders, the output primitives generated
    from each input primitive are passed to subsequent pipeline stages using
    the shader invocation number to order the output.  The first primitives
    received by the subsequent pipeline stages are those emitted by the shader
    invocation numbered zero, followed by those from the shader invocation
    numbered one, and so forth.  Additionally, all output primitives generated
    from a given input primitive are passed to subsequent pipeline stages
    before any output primitives generated from subsequent input primitives.


    Geometry Shader Vertex Streams

    Geometry shaders may emit primitives to multiple independent vertex
    streams.  Each vertex emitted by the geometry shader is directed at one of
    the vertex streams.  As vertices are received on each stream, they are
    arranged into primitives of the type specified by the geometry shader
    output primitive type.  The shading language built-in functions
    EndPrimitive() and EndStreamPrimitive() may be used to end the primitive
    being assembled on a given vertex stream and start a new empty primitive
    of the same type.  If an implementation supports <N> vertex streams, the
    individual streams are numbered 0 through <N>-1.  There is no requirement
    on the order of the streams to which vertices are emitted, and the number
    of vertices emitted to each stream may be completely independent, subject
    only to implementation-dependent output limits.

    The primitives emitted to all vertex streams are passed to the transform
    feedback stage to be captured and written to buffer objects in the manner
    specified by the transform feedback state.  The primitives emitted to all
    streams but stream zero are discarded after transform feedback.
    Primitives emitted to stream zero are passed to subsequent pipeline stages
    for clipping, rasterization, and subsequent fragment processing.

    Geometry shaders that emit vertices to multiple vertex streams are
    currently limited to using only the "points" output primitive type.  A
    program will fail to link if it includes a geometry shader that calls the
    EmitStreamVertex() built-in function and has any other output primitive
    type parameter.


Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)

    Modify Section 3.3.1, Multisampling, p. 148

    (add new paragraph at the end of the section, p. 149)

    If MULTISAMPLE is enabled and the current program object includes a
    fragment shader with one or more input variables qualified with "sample
    in", the data associated with those variables will be assigned
    independently.  The values for each sample must be evaluated at the
    location of the sample.  The data associated with any other variables not
    qualified with "sample in" need not be evaluated independently for each
    sample.


    Modify ARB_texture_gather, "Changes to Section 3.8.8"

    (extend language describing the operation of textureGather, allowing the
     new <comp> argument to select any of the four components from a
     multi-component texel vector)

    The textureGather and textureGatherOffset built-in shader functions...  A
    four-component vector is then assembled by taking a single component from
    the swizzled texture source colors of the four texels, in the order
    T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0.  The selected component is
    identified by the optional <comp> argument, where the values zero, one,
    two, and three identify the Rs, Gs, Bs, or As component, respectively.  If
    <comp> is omitted, it is treated as identifying the Rs component.
    Incomplete textures (section 3.8.10) are considered to return a texture
    source color of (0,0,0,1) for all four source texels.

    (add further language describing textureGatherOffsets)

    The textureGatherOffsets built-in functions from the OpenGL Shading
    Language return a vector derived from sampling four texels in the image
    array of level <level_base>.  For each of the four texel offsets specified
    by the <offsets> argument, the rules for the LINEAR minification filter
    are applied to identify a 2x2 texel footprint, from which the single texel
    T_i0_j0 is selected.  A four-component vector is then assembled by taking
    a single component from each of the four T_i0_j0 texels in the same manner
    as for the textureGather function.


    Modify Section 3.12.1, Shader Variables, p. 273

    (insert prior to the last paragraph of the section, p. 274)

    When interpolating built-in and user-defined varying variables, the default
    screen-space location at which these variables are sampled is defined in
    previous rasterization sections.  The default location may be overriden by
    interpolation qualifiers.  When interpolating variables declared using
    "centroid in", the variable is sampled at a location within the pixel
    covered by the primitive generating the fragment.  When interpolating
    variables declared using "sample in" when MULTISAMPLE is enabled, the
    fragment shader will be invoked separately for each covered sample and the
    variable will be sampled at the corresponding sample point.

    Additionally, built-in fragment shader functions provide further
    fine-grained control over interpolation.  The built-in functions
    interpolateAtCentroid() and interpolateAtSample() will sample variables as
    though they were declared with the "centroid" or "sample" qualifiers,
    respectively.  The built-in function interpolateAtOffset() will sample
    variables at a specified (x,y) offset relative to the center of the pixel.
    The range and granularity of offsets supported by this function is
    implementation-dependent.  If either component of the specified offset is
    less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than
    MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the
    variable is undefined.  Not all values of <offset> may be supported; x and
    y offsets may be rounded to fixed-point values with the number of fraction
    bits given by the implementation-dependent constant
    FRAGMENT_INTERPOLATION_OFFSET_BITS.


    Modify Section 3.12.2, Shader Execution, p. 274

    (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277)

    The built-in variable gl_SampleMaskIn[] is an integer array holding
    bitfields indicating the set of fragment samples covered by the primitive
    corresponding to the fragment shader invocation.  The number of elements
    in the array is ceil(<s>/32), where <s> is the maximum number of color
    samples supported by the implementation.  Bit <n> of element <w> in the
    array is set if and only if the sample numbered <w>*32+<n> is considered
    covered for this fragment shader invocation.  When rendering to a
    non-multisample buffer, or if multisample rasterization is disabled, all
    bits are zero except for bit zero of the first array element.  That bit
    will be one if the pixel is covered and zero otherwise.  Bits in the
    sample mask corresponding to covered samples that will be killed due to
    SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3).  When
    per-sample shading is active due to the use of a fragment input qualified
    by "sample", only the bit for the current sample is set in
    gl_SampleMaskIn.  When OpenGL API state specifies multiple fragment shader
    invocations for a given fragment, the sample mask for any single fragment
    shader invocation may specify a subset of the covered samples for the
    fragment.  In this case, the bit corresponding to each covered sample will
    be set in exactly one fragment shader invocation.


Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)

    None.

Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)

    None.

Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)

    Modify Section 6.1.16, Shader and Program Queries, p. 384

    (add to long first paragraph, p. 386) ... If <pname> is
    GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per
    primitive will be returned.  If GEOMETRY_VERTICES_OUT,
    GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS
    are queried for a program which has not been linked successfully, or which
    does not contain objects to form a geometry shader, then an
    INVALID_OPERATION error is generated.


Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)

    None.

Additions to the AGL/GLX/WGL Specifications

    None.

Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_ARB_gpu_shader5 : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_ARB_gpu_shader5        1


    Modify Section 3.6, Keywords, p. 14

    (add to the keyword list)

      sample


    Modify Section 4.1.7, Samplers, p. 23

    (modify 1st paragraph of the section, deleting the restriction requiring
    constant indexing of sampler arrays but still requiring uniform indexing
    across invocations) ... Samplers may aggregated into arrays within a
    shader (using square brackets [ ]) and can be indexed with general integer
    expressions.  The results of accessing a sampler array with an
    out-of-bounds index are undefined. ...

    (add new paragraph restricting the use of general integer expression in
    sampler array indexing) When indexing an array of samplers, the integer
    expression used to index the array must be uniform across shader
    invocations.  If this restriction is not satisfied, the results of
    accessing the sampler array are undefined.  For the purposes of this
    uniformity test, the index used for texture lookups performed inside a
    loop is considered uniform for the <n>th loop iteration if all shader
    invocations that execute the loop at least <n> times compute the same
    index on that iteration.  For texture lookups inside a function other than
    main(), an index is considered uniform if the value is the same for all
    invocations calling the function from the same point in the caller.  For
    nested loops and function calls, the uniformity test requires that the
    index match only those other shader invocations with identical loop
    iteration counts and function call chains.


    Modify Section 4.1.10, Implicit Conversions, p. 27

    (modify table of implicit conversions)

                                Can be implicitly
        Type of expression        converted to
        ---------------------   -----------------
        int                     uint, float
        ivec2                   uvec2, vec2
        ivec3                   uvec3, vec3
        ivec4                   uvec4, vec4

        uint                    float
        uvec2                   vec2
        uvec3                   vec3
        uvec4                   vec4

    (modify second paragraph of the section) No implicit conversions are
    provided to convert from unsigned to signed integer types or from
    floating-point to integer types.  There are no implicit array or structure
    conversions.

    (insert before the final paragraph of the section) When performing
    implicit conversion for binary operators, there may be multiple data types
    to which the two operands can be converted.  For example, when adding an
    int value to a uint value, both values can be implicitly converted to uint
    and float.  In such cases, a floating-point type is chosen if either
    operand has a floating-point type.  Otherwise, an unsigned integer type is
    chosen if either operand has an unsigned integer type.  Otherwise, a
    signed integer type is chosen.
    

    Modify Section 4.3, Storage Qualifiers, p. 29

    (add to first table on the page)

      Qualifier         Meaning
      --------------    ----------------------------------------
      sample in         linkage with per-sample interpolation
      sample out        linkage with per-sample interpolation

    (modify third paragraph, p. 29) These interpolation qualifiers may only
    precede the qualifiers in, centroid in, sample in, out, centroid out, or
    sample out in a declaration.  ...


    Modify Section 4.3.4, Inputs, p. 31

    (modify first paragraph of section) Shader input variables are declared
    with the in, centroid in, or sample in storage qualifiers. ... Variables
    declared as in, centroid in, or sample in may not be written to during
    shader execution. ...

    (modify third paragraph, p. 32) ...  Fragment shader inputs get
    per-fragment values, typically interpolated from a previous stage's
    outputs.  They are declared in fragment shaders with the in, centroid in,
    or sample in storage qualifiers or the deprecated varying and centroid
    varying storage qualifiers. ...

    (add to examples immediately below)

      sample in vec4 perSampleColor;


    Modify Section 4.3.6, Outputs, p. 33

    (modify first paragraph of section) Shader output variables are declared
    with the out, centroid out, or sample out storage qualifiers. ...

    (modify third paragraph of section) Vertex and geometry output variables
    output per-vertex data and are declared using the out, centroid out, or
    sample out storage qualifiers, or the deprecated varying storage
    qualifier.

    (add to examples immediately below)

      sample out vec4 perSampleColor;

    (modify last paragraph, p. 33) Fragment outputs output per-fragment data
    and are declared using the out storage qualifier. It is an error to use
    centroid out or sample out in a fragment shader. ...
    

    Modify Section 4.3.7, Interface Blocks, p. 34

    (modify last paragaph, p. 36, removing the requirement for indexing
    uniform blocks using constant expressions) For uniform blocks declared as
    arrays, each individual array element corresponds to a separate buffer
    object backing one instance of the block.  As the array size indicates the
    number of buffer objects needed, uniform block array declarations must
    specify an integral array size.  Arbitrary indices may be used to index a
    uniform block array; integral constant expressions are not required.  If
    the index used to access an array of uniform blocks is out-of-bounds, the
    results of the access are undefined.


    Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37

    (modify last paragraph, p. 37, and subsequent paragraphs on p. 38)

    Geometry shaders support input layout qualifiers.  There are two types of
    layout qualifiers used to specify an input primitive type and an
    invocation count.  The input primitive type and invocation count
    qualifiers are allowed only on the interface qualifier in, not on an input
    block, block member, or variable.

      layout-qualifier-id
        points
        lines
        lines_adjacency
        triangles
        triangles_adjacency
        invocations = integer-constant

    The identifiers "points", "lines", "lines_adjacency", "triangles", and
    "triangles_adjacency" are used to specify the type of input primitive
    accepted by the geometry shader, and only one of these is accepted.  At
    least one geometry shader (compilation unit) in a program must declare an
    input primitive type, and all geometry shader input primitive type
    declarations in a program must declare the same type.  It is not required
    that all geometry shaders in a program declare an input primitive type.

    The identifier "invocations" is used to specify the number of times the
    geometry shader is invoked for each input primitive received.  Invocation
    count declarations are optional.  If no invocation count is declared in
    any geometry shader in the program, the geometry shader will be run once
    for each input primitive.  If an invocation count is declared, all such
    declarations must specify the same count.  If a shader specifies an
    invocation count greater than the implementation-dependent maximum, it
    will fail to compile.

    For example,

      layout(triangles, invocations=6) in;

    will establish that all inputs to the geometry shader are triangles and
    that the geometry shader is run six times for each triangle processed.

    All geometry shader input unsized array declarations ...


    Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40

    (modify second and subsequent paragraphs, p. 40)

    Geometry shaders can have output layout qualifiers.  There are three types
    of output layout qualifiers used to specify an output primitive type, a
    maximum output vertex count, and per-output stream numbers.  The output
    primitive type and output vertex count qualifiers are allowed only on the
    interface qualifier out, not on an output block, block member, or variable
    declaration.  The output stream number qualifier is allowed on the
    interface qualifier out, or on output blocks or variable declarations.

    The layout qualifier identifiers for geometry shader outputs are

      layout-qualifier-id
        points
        line_strip
        triangle_strip
        max_vertices = integer-constant
        stream = integer-constant

    The identifiers "points", "line_strip", and "triangle_strip" are used to
    specify the type of output primitive produced by the geometry shader, and
    only one of these is accepted.  At least one geometry shader (compilation
    unit) in a program must declare an output primitive type, and all geometry
    shader output primitive type declarations in a program must declare the
    same primitive type.  It is not required that all geometry shaders in a
    program declare an output primitive type.

    The identifier "max_vertices" is used to specify the maximum number of
    vertices the shader will ever emit in a single invocation.  At least one
    geometry shader (compilation unit) in a program must declare an maximum
    output vertex count, and all geometry shader output vertex count
    declarations in a program must declare the same count.  It is not required
    that all geometry shaders in a program declare a count.

    In the example,

      layout(triangle_strip, max_vertices = 60) out; // order does not matter
      layout(max_vertices = 60) out; // redeclaration okay
      layout(triangle_strip) out; // redeclaration okay
      layout(points) out; // error, contradicts triangle_strip
      layout(max_vertices = 30) out; // error, contradicts 60

    all outputs from the geometry shader are triangles and at most 60 vertices
    will be emitted by the shader.  It is an error for the maximum number of
    vertices to be greater than gl_MaxGeometryOutputVertices.

    The identifier "stream" is used to specify that a geometry shader output
    variable or block is associated with a particular vertex stream (numbered
    beginning with zero).  A default stream number may be declared at global
    scope by qualifying interface qualifier out as in this example:

      layout(stream = 1) out;

    The stream number specified in such a declaration replaces any previous
    default and applies to all subsequent block and variable declarations
    until a new default is established.  The initial default stream number is
    zero.

    Each output block or non-block output variable is associated with a vertex
    stream.  If the block or variable is declared with a stream qualifier, it
    is associated with the specified stream; otherwise, it is associated with
    the current default stream.  A block member may be declared with a stream
    qualifier, but the specified stream must match the stream associated with
    the containing block.  One example:

      layout(stream=1) out;             // default is now stream 1
      out vec4 var1;                    // var1 gets default stream (1)
      layout(stream=2) out Block1 {     // "Block1" belongs to stream 2
        layout(stream=2) vec4 var2;     // redundant block member stream decl
        layout(stream=3) vec2 var3;     // ILLEGAL (must match block stream)
        vec3 var4;                      // belongs to stream 2
      };
      layout(stream=0) out;             // default is now stream 0
      out vec4 var5;                    // var5 gets default stream (0)
      out Block2 {                      // "Block2" gets default stream (0)
        vec4 var6;
      };
      layout(stream=3) out vec4 var7;   // var7 belongs to stream 3

    If a geometry shader output block or variable is declared more than once,
    all such declarations must associate the variable with the same vertex
    stream.  If any stream declaration specifies a non-existent stream number,
    the shader will fail to compile.

    Built-in geometry shader outputs are always associated with vertex stream
    zero.

    Each vertex emitted by the geometry shader is assigned to a specific
    stream, and the attributes of the emitted vertex are taken from the set of
    output blocks and variables assigned to the targeted stream.  After each
    vertex is emitted, the values of all output variables become undefined.
    Additionally, the output variables associated with each vertex stream may
    share storage.  Writing to an output variable associated with one stream
    may overwrite output variables associated with any other stream.  When
    emitting each vertex, a geometry shader should write to all outputs
    associated with the stream to which the vertex will be emitted and to no
    outputs associated with any other stream.


    Modify Section 4.3.9, Interpolation, p. 42

    (modify first paragraph of section, add reference to sample in/out) The
    presence of and type of interpolation is controlled by the storage
    qualifiers centroid in, sample in, centroid out, and sample out, by the
    optional interpolation qualifiers smooth, flat, and noperspective, and by
    default behaviors established through the OpenGL API when no interpolation
    qualifier is present. ...

    (modify second paragraph) ... A variable may be qualified as flat centroid
    or flat sample, which will mean the same thing as qualifying it only as
    flat.

    (replace last paragraph, p. 42) 

    When multisample rasterization is disabled, or for fragment shader input
    variables qualified with neither "centroid in" nor "sample in", the value
    of the assigned variable may be interpolated anywhere within the pixel and
    a single value may be assigned to each sample within the pixel, to the
    extent permitted by the OpenGL Specification.

    When multisample rasterization is enabled, "centroid" and "sample" may be
    used to control the location and frequency of the sampling of the
    qualified fragment shader input.  If a fragment shader input is qualified
    with "centroid", a single value may be assigned to that variable for all
    samples in the pixel, but that value must be interpolated at a location
    that lies in both the pixel and in the primitive being rendered, including
    any of the pixel's samples covered by the primitive.  Because the location
    at which the variable is sampled may be different in neighboring pixels,
    derivatives of centroid-sampled inputs may be less accurate than those for
    non-centroid interpolated variables.  If a fragment shader input is
    qualified with "sample", a separate value must be assigned to that
    variable for each covered sample in the pixel, and that value must be
    sampled at the location of the individual sample.


    (Insert before Section 4.7, Order of Qualification, p. 47)

    Section 4.Q, The Precise Qualifier

    Some algorithms may require that floating-point computations be carried
    out in exactly the manner specified in the source code, even if the
    implementation supports optimizations that could produce nearly equivalent
    results with higher performance.  For example, many GL implementations
    support a "multiply-add" that can compute values such as

      float result = (float(a) * float(b)) + float(c);

    in a single operation.  The result of a floating-point multiply-add may
    not always be identical to first doing a multiply yielding a
    floating-point result, and then doing a floating-point add.  By default,
    implementations are permitted to perform optimizations that effectively
    modify the order of the operations used to evaluate an expression, even if
    those optimizations may produce slightly different results relative to
    unoptimized code.

    The qualifier "precise" will ensure that operations contributing to a
    variable's value are performed in the order and with the precision
    specified in the source code.  Order of evaluation is determined by
    operator precedence and parentheses, as described in Section 5.
    Expressions must be evaluated with a precision consistent with the
    operation; for example, multiplying two "float" values must produce a
    single value with "float" precision.  This effectively prohibits the
    arbitrary use of fused multiply-add operations if the intermediate
    multiply result is kept at a higher precision.  For example:

      precise out vec4 position;

    declares that computations used to produce the value of "position" must be
    performed precisely using the order and precision specified.  As with the
    invariant qualifier (section 4.6.1), the precise qualifier may be used to
    qualify a built-in or previously declared user-defined variable as being
    precise:

      out vec3 Color;
      precise Color;            // make existing Color be precise

    This qualifier will affect the evaluation of expressions used on the
    right-hand side of an assignment if and only if:

      * the variable assigned to is qualified as "precise"; or

      * the value assigned is used later in the same function, either directly
        or indirectly, on the right-hand of an assignment to a variable
        declared as "precise".

    Expressions computed in a function are treated as precise only if assigned
    to a variable qualified as "precise" in that same function.  Any other
    expressions within a function are not automatically treated as precise,
    even if they are used to determine a value that is returned by the
    function and directly assigned to a variable qualified as "precise".

    Some examples of the use of "precise" include:

      in vec4 a, b, c, d;
      precise out vec4 v;

      float func(float e, float f, float g, float h)
      {
        return (e*f) + (g*h);            // no special precision
      }

      float func2(float e, float f, float g, float h)
      {
        precise result = (e*f) + (g*h);  // ensures a precise return value
        return result;
      }

      float func3(float i, float j, precise out float k)
      {
        k = i * i + j;                   // precise, due to <k> declaration
      }

      void main(void)
      {
        vec4 r = vec3(a * b);           // precise, used to compute v.xyz
        vec4 s = vec3(c * d);           // precise, used to compute v.xyz
        v.xyz = r + s;                          // precise                      
        v.w = (a.w * b.w) + (c.w * d.w);        // precise
        v.x = func(a.x, b.x, c.x, d.x);         // values computed in func()
                                                // are NOT precise
        v.x = func2(a.x, b.x, c.x, d.x);        // precise!
        func3(a.x * b.x, c.x * d.x, v.x);       // precise!
      }


    Modify Section 4.7, Order of Qualification, p. 47

    When multiple qualifications are present, they must follow a strict order.
    This order is as follows:

      precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier
         precision-qualifier


    Modify Section 5.9, Expressions, p. 57

    (modify bulleted list as follows, adding support for implicit conversion
    between signed and unsigned types)

    Expressions in the shading language are built from the following:

    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
      types, and all matrix types.

    ...

    * The operator modulus (%) operates on signed or unsigned integer scalars
      or vectors.  If the fundamental types of the operands do not match, the
      conversions from Section 4.1.10 "Implicit Conversions" are applied to
      produce matching types.  ...


    Modify Section 6.1, Function Definitions, p. 63

    (modify description of overloading, beginning at the top of p. 64)

     Function names can be overloaded.  The same function name can be used for
     multiple functions, as long as the parameter types differ.  If a function
     name is declared twice with the same parameter types, then the return
     types and all qualifiers must also match, and it is the same function
     being declared.  For example,

       vec4 f(in vec4 x, out vec4  y);   // (A)
       vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
       vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type

       int  f(in vec4 x, out ivec4 y);  // error, only return type differs
       vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
       vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs

     When function calls are resolved, an exact type match for all the
     arguments is sought.  If an exact match is found, all other functions are
     ignored, and the exact match is used.  If no exact match is found, then
     the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
     applied to find a match.  Mismatched types on input parameters (in or
     inout or default) must have a conversion from the calling argument type
     to the formal parameter type.  Mismatched types on output parameters (out
     or inout) must have a conversion from the formal parameter type to the
     calling argument type.

     If implicit conversions can be used to find more than one matching
     function, a single best-matching function is sought.  To determine a best
     match, the conversions between calling argument and formal parameter
     types are compared for each function argument and pair of matching
     functions.  After these comparisons are performed, each pair of matching
     functions are compared.  A function definition A is considered a better
     match than function definition B if:

       * for at least one function argument, the conversion for that argument
         in A is better than the corresponding conversion in B; and

       * there is no function argument for which the conversion in B is better
         than the corresponding conversion in A.

     If a single function definition is considered a better match than every
     other matching function definition, it will be used.  Otherwise, a
     semantic error occurs and the shader will fail to compile.

     To determine whether the conversion for a single argument in one match is
     better than that for another match, the following rules are applied, in
     order:

       1. An exact match is better than a match involving any implicit
          conversion.

       2. A match involving an implicit conversion from float to double is
          better than a match involving any other implicit conversion.

       3. A match involving an implicit conversion from either int or uint to
          float is better than a match involving an implicit conversion from
          either int or uint to double.

     If none of the rules above apply to a particular pair of conversions,
     neither conversion is considered better than the other.

     For the function prototypes (A), (B), and (C) above, the following
     examples show how the rules apply to different sets of calling argument
     types:

       f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
       f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
       f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
                             //   (C) not relevant, can't convert vec4 to 
                             //   ivec4.  (A) better than (B) for 2nd
                             //   argument (rule 2), same on first argument.
       f(ivec4, vec4);       // NOT matched.  All three match by implicit
                             //   conversion.  (C) is better than (A) and (B)
                             //   on the first argument.  (A) is better than
                             //   (B) and (C).


    Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69

    (add to the list of geometry shader special variables, p. 69)

      in int gl_InvocationID;

    (add to the end of the section, p. 71)

    The input variable gl_InvocationID is available in the geometry language
    and is filled with an integer holding the invocation number associated
    with the given shader invocation.  If the program is linked to support
    multiple geometry shader invocations per input primitive, the invocations
    are numbered 0, 1, 2, ..., <N>-1.  gl_InvocationID is not available in the
    vertex or fragment language.


    Modify Section 7.2, Fragment Shader Special Variables, p. 72

    (add to the list of built-in variables)

      in int gl_SampleMaskIn[];

    The variable gl_SampleMaskIn is an array of integers, each holding a
    bitfield indicating the set of samples covered by the primitive generating
    the fragment during multisample rasterization.  The array has ceil(<s>/32)
    elements, where <s> is the maximum number of color samples supported by
    the implementation.  Bit <n> or word <w> in the bitfield is set if and
    only if the sample numbered <w>*32+<n> is considered covered for this
    fragment shader invocation.


    Modify Section 8.3, Common Functions, p. 84

    (add support for floating-point multiply-add)

    Syntax:

      genType fma(genType a, genType b, genType c);

    The function fma() performs a fused floating-point multiply-add to compute
    the value a*b+c.  The results of fma() may not be identical to evaluating
    the expression (a*b)+c, because the computation may be performed in a
    single operation with intermediate precision different from that used to
    compute a non-fma() expression.

    The results of fma() are guaranteed to be invariant given fixed inputs
    <a>, <b>, and <c>, as though the result were taken from a variable
    declared as "precise".


    (add support for single-precision frexp and ldexp functions)

    Syntax:

      genType frexp(genType x, out genIType exp);
      genType ldexp(genType x, in genIType exp);

    The function frexp() splits each single-precision floating-point number in
    <x> into a binary significand, a floating-point number in the range [0.5,
    1.0), and an integral exponent of two, such that:

      x = significand * 2 ^ exponent

    The significand is returned by the function; the exponent is returned in
    the parameter <exp>.  For a floating-point value of zero, the significant
    and exponent are both zero.  For a floating-point value that is an
    infinity or is not a number, the results of frexp() are undefined.  

    If the input <x> is a vector, this operation is performed in a
    component-wise manner; the value returned by the function and the value
    written to <exp> are vectors with the same number of components as <x>.

    The function ldexp() builds a single-precision floating-point number from
    each significand component in <x> and the corresponding integral exponent
    of two in <exp>, returning:

      significand * 2 ^ exponent

    If this product is too large to be represented as a single-precision
    floating-point value, the result is considered undefined.

    If the input <x> is a vector, this operation is performed in a
    component-wise manner; the value passed in <exp> and returned by the
    function are vectors with the same number of components as <x>.


    (add support for new integer built-in functions)

    Syntax:

      genIType bitfieldExtract(genIType value, int offset, int bits);
      genUType bitfieldExtract(genUType value, int offset, int bits);

      genIType bitfieldInsert(genIType base, genIType insert, int offset, 
                              int bits);
      genUType bitfieldInsert(genUType base, genUType insert, int offset, 
                              int bits);

      genIType bitfieldReverse(genIType value);
      genUType bitfieldReverse(genUType value);

      genIType bitCount(genIType value);
      genIType bitCount(genUType value);

      genIType findLSB(genIType value);
      genIType findLSB(genUType value);

      genIType findMSB(genIType value);
      genIType findMSB(genUType value);

    The function bitfieldExtract() extracts bits <offset> through
    <offset>+<bits>-1 from each component in <value>, returning them in the
    least significant bits of corresponding component of the result.  For
    unsigned data types, the most significant bits of the result will be set
    to zero.  For signed data types, the most significant bits will be set to
    the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
    zero.  The result will be undefined if <offset> or <bits> is negative, or
    if the sum of <offset> and <bits> is greater than the number of bits used
    to store the operand.  Note that for vector versions of bitfieldExtract(),
    a single pair of <offset> and <bits> values is shared for all components.

    The function bitfieldInsert() inserts the <bits> least significant bits of
    each component of <insert> into the corresponding component of <base>.
    The result will have bits numbered <offset> through <offset>+<bits>-1
    taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
    directly from the corresponding bits of <base>.  If <bits> is zero, the
    result will simply be <base>.  The result will be undefined if <offset> or
    <bits> is negative, or if the sum of <offset> and <bits> is greater than
    the number of bits used to store the operand.  Note that for vector
    versions of bitfieldInsert(), a single pair of <offset> and <bits> values
    is shared for all components.

    The function bitfieldReverse() reverses the bits of <value>.  The bit
    numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
    <value>, where <bits> is the total number of bits used to represent
    <value>.

    The function bitCount() returns the number of one bits in the binary
    representation of <value>.

    The function findLSB() returns the bit number of the least significant one
    bit in the binary representation of <value>.  If <value> is zero, -1 will
    be returned.

    The function findMSB() returns the bit number of the most significant bit
    in the binary representation of <value>.  For positive integers, the
    result will be the bit number of the most significant one bit.  For
    negative integers, the result will be the bit number of the most
    significant zero bit.  For a <value> of zero or negative one, -1 will be
    returned.


    (add support for general packing functions)

    Syntax:

      uint      packUnorm2x16(vec2 v);
      uint      packUnorm4x8(vec4 v);
      uint      packSnorm4x8(vec4 v);

      vec2      unpackUnorm2x16(uint v);
      vec4      unpackUnorm4x8(uint v);
      vec4      unpackSnorm4x8(uint v);

    The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first
    convert each component of a two- or four-component vector of normalized
    floating-point values into 8- or 16-bit integer values.  Then, the results
    are packed into a 32-bit unsigned integer.  The first component of the
    vector will be written to the least significant bits of the output; the
    last component will be written to the most significant bits.

    The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8()
    first unpacks a single 32-bit unsigned integer into a pair of 16-bit
    unsigned integers, four 8-bit unsigned integers, or four 8-bit signed
    integers.  The, each component is converted to a normalized floating-point
    value to generate a two- or four-component vector.  The first component of
    the vector will be extracted from the least significant bits of the input;
    the last component will be extracted from the most significant bits.

    The conversion between fixed- and normalized floating-point values will be
    performed as below.

      function          conversion
      ---------------   -----------------------------------------------------
      packUnorm2x16     fixed_val = round(clamp(float_val, 0, +1) * 65535.0);
      packUnorm4x8      fixed_val = round(clamp(float_val, 0, +1) * 255.0);
      packSnorm4x8      fixed_val = round(clamp(float_val, -1, +1) * 127.0);
      unpackUnorm2x16   float_val = fixed_val / 65535.0;
      unpackUnorm4x8    float_val = fixed_val / 255.0;
      unpackSnorm4x8    float_val = clamp(fixed_val / 127.0, -1, +1);


    (add functions to get/set the bit encoding for floating-point values)

    32-bit floating-point data types in the OpenGL shading language are
    specified to be encoded according to the IEEE 754 specification for
    single-precision floating-point values.  The functions below allow shaders
    to convert floating-point values to and from signed or unsigned integers
    representing their encoding.

    To obtain signed or unsigned integer values holding the encoding of a
    floating-point value, use:

      genIType floatBitsToInt(genType value);
      genUType floatBitsToUint(genType value);

    Conversions are done on a component-by-component basis.

    To obtain a floating-point value corresponding to a signed or unsigned
    integer encoding, use:

      genType intBitsToFloat(genIType value);
      genType uintBitsToFloat(genUType value);


    (support for unsigned integer add/subtract with carry-out)

    Syntax:

      genUType uaddCarry(genUType x, genUType y, out genUType carry);
      genUType usubBorrow(genUType x, genUType y, out genUType borrow);

    The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
    <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
    the sum was less than 2^32, or one otherwise.

    The function usubBorrow() subtracts the 32-bit unsigned integer or vector
    <y> from <x>, returning the difference if non-negative or 2^32 plus the
    difference, otherwise.  The value <borrow> is set to zero if x >= y, or
    one otherwise.


    (support for signed and unsigned multiplies, with 32-bit inputs and a
     64-bit result spanning two 32-bit outputs)

    Syntax:

      void umulExtended(genUType x, genUType y, out genUType msb, 
                        out genUType lsb);
      void imulExtended(genIType x, genIType y, out genIType msb,
                        out genIType lsb);

    The functions umulExtended() and imulExtended() multiply 32-bit unsigned
    or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
    32 least significant bits are returned in <lsb>; the 32 most significant
    bits are returned in <msb>.


    Modify Section 8.7, Texture Lookup Functions, p. 91

    (extend the basic versions of textureGather from ARB_texture_gather,
     allowing for optional component selection in a multi-component texture
     and for shadow mapping)

    Syntax:
      gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]);
      gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]);
      gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]);
      gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]);
      gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]);

      vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ);
      vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ);
      vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ);
      vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord, 
                         float refZ);
      vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ);

    The textureGather() functions use the texture coordinates given by <coord>
    to determine a set of four texels to sample from the texture identified by
    <sampler>.  These functions return a four-component vector consisting of
    one component from each texel.  If specified, the value of <comp> must be
    a constant integer expression with a value of zero, one, two, or three,
    identifying the <x>, <y>, <z>, or <w> component of the four-component
    vector lookup result for each texel, respectively.  If <comp> is not
    specified, the <x> component of each texel will be used to generate the
    result vector.  As described in the OpenGL Specification, the vector
    selects the post-swizzle component corresponding to <comp> from each of
    the four texels, returning:

      vec4(T_i0_j1(coord, base).<comp>,
           T_i1_j1(coord, base).<comp>,
           T_i1_j0(coord, base).<comp>,
           T_i0_j0(coord, base).<comp>)

    For textureGather() functions using a shadow sampler type, each of the
    four texel lookups performs a depth comparison against the depth reference
    value passed in <refZ>, and returns the result of that comparison in the
    appropriate component of the result vector.  The parameter <comp> used for
    component selection is not supported for textureGather() functions with
    shader sampler types.

    As with other texture lookup functions, the results of textureGather() are
    undefined for shadow samplers if the texture referenced is not a depth
    texture or has depth comparisons disabled; or for non-shadow samplers if
    the texture referenced is a depth texture with depth comparisons enabled.


    (extend the "Offset" versions of textureGather from ARB_texture_gather,
     allowing for optional component selection in a multi-component texture,
     non-constant offsets, and shadow mapping)

    Syntax:
      gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord, 
                                ivec2 offset[, int comp]);
      gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord, 
                                ivec2 offset[, int comp]);
      gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord, 
                                ivec2 offset[, int comp]);

      vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord, 
                               float refZ, ivec2 offset);
      vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord, 
                               float refZ, ivec2 offset);
      vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord, 
                               float refZ, ivec2 offset);

    The textureGatherOffset() functions operate identically to
    textureGather(), except that the 2-component integer texel offset vector
    <offset> is applied as a (u,v) offset to determine the four texels to
    sample.  The value <offset> need not be constant; however, a limited range
    of offset values are supported.  If any component of <offset> is less than
    MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than
    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture
    coordinates is undefined.  Note that <offset> does not apply to the layer
    coordinate for array textures.

    
    (add new "Offsets" versions of textureGather from ARB_texture_gather,
     allowing for optional component selection in a multi-component texture,
     separate non-constant offsets for each texel in the footprint, and shadow
     mapping)

    Syntax:
      gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord,
                                 ivec2 offsets[4][, int comp]);
      gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord,
                                 ivec2 offsets[4][, int comp]);
      gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord,
                                 ivec2 offsets[4][, int comp]);

      vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord, 
                                float refZ, ivec2 offsets[4]);
      vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord,
                                float refZ, ivec2 offsets[4]);
      vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord, 
                                float refZ, ivec2 offsets[4]);

    The textureGatherOffsets() functions operate identically to
    textureGather(), except that the array of two-component integer vectors
    <offsets> is used to determine the location of the four texels to sample.
    Each of the four texels is obtained by applying the corresponding offset
    in the four-element array <offsets> as a (u,v) coordinate offset to the
    coordinates <coord>, identifying the four-texel LINEAR footprint, and then
    selecting the texel T_i0_j0 of that footprint.  The specified values in
    <offsets> must be constant.  A limited range of offset values are
    supported; the minimum and maximum offset values are
    implementation-dependent and given by
    MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and
    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively.  Note that <offset>
    does not apply to the layer coordinate for array textures.

    
    Modify Section 8.8, Fragment Processing Functions, p. 101

    (add new functions to the end of section, p. 102)

    Built-in interpolation functions are available to compute an interpolated
    value of a fragment shader input variable at a shader-specified (x,y)
    location.  A separate (x,y) location may be used for each invocation of
    the built-in function, and those locations may differ from the default
    (x,y) location used to produce the default value of the input.

      float interpolateAtCentroid(float interpolant);
      vec2 interpolateAtCentroid(vec2 interpolant);
      vec3 interpolateAtCentroid(vec3 interpolant);
      vec4 interpolateAtCentroid(vec4 interpolant);

      float interpolateAtSample(float interpolant, int sample);
      vec2 interpolateAtSample(vec2 interpolant, int sample);
      vec3 interpolateAtSample(vec3 interpolant, int sample);
      vec4 interpolateAtSample(vec4 interpolant, int sample);

      float interpolateAtOffset(float interpolant, vec2 offset);
      vec2 interpolateAtOffset(vec2 interpolant, vec2 offset);
      vec3 interpolateAtOffset(vec3 interpolant, vec2 offset);
      vec4 interpolateAtOffset(vec4 interpolant, vec2 offset);

    The function interpolateAtCentroid() will return the value of the input
    varying <interpolant> sampled at a location inside the both the pixel and
    the primitive being processed.  The value obtained would be the same value
    assigned to the input variable if declared with the "centroid" qualifier.

    The function interpolateAtSample() will return the value of the input
    varying <interpolant> at the location of the sample numbered <sample>.  If
    multisample buffers are not available, the input varying will be evaluated
    at the center of the pixel.  If the sample number given by <sample> does
    not exist, the position used to interpolate the input varying is
    undefined.

    The function interpolateAtOffset() will return the value of the input
    varying <interpolant> sampled at an offset from the center of the pixel
    specified by <offset>.  The two floating-point components of <offset>
    give the offset in pixels in the x and y directions, respectively.  
    An offset of (0,0) identifies the center of the pixel.  The range and
    granularity of offsets supported by this function is
    implementation-dependent.  

    For all of the interpolation functions, <interpolant> must be an input
    variable or an element of an input variable declared as an array.
    Component selection operators (e.g., ".xy") may not be used when
    specifying <interpolant>.  If <interpolant> is declared with a "flat" or
    "centroid" qualifier, the qualifier will have no effect on the
    interpolated value.  If <interpolant> is declared with the "noperspective"
    qualifier, the interpolated value will be computed without perspective
    correction.


    Modify Section 8.10, Geometry Shader Functions, p. 104

    (replace the section, using the following more general formulation)

    These functions are only available in geometry shaders.

    Syntax:

        void EmitStreamVertex(int stream);      // Geometry-only
        void EndStreamPrimitive(int stream);    // Geometry-only

        void EmitVertex();                      // Geometry-only
        void EndPrimitive();                    // Geometry-only

    Description:

    The function EmitStreamVertex() specifies that the vertex being generated
    by the geometry shader is completed.  A vertex is added to the current
    output primitive in the vertex stream numbered <stream> using the current
    values of all output variables associated with <stream>.  The values of
    any unwritten output variables associated with <stream> are undefined.
    The argument <stream> must be a constant integral expression.  The values
    of all output variables (for all output streams) are undefined after
    calling EmitStreamVertex().  If a geometry shader invocation has emitted
    more vertices than permitted by the output layout qualifier
    "max_vertices", the results of calling EmitStreamVertex() are undefined.

    The function EmitVertex() is equivalent to calling EmitStreamVertex() with
    <stream> set to zero.

    The function EndStreamPrimitive() specifies that the current output
    primitive for the vertex stream numbered <stream> is completed and that a
    new empty output primitive of the same type should be started.  The
    argument <stream> must be a constant integral expression.  This function
    does not emit a vertex.  If the output layout is declared to be "points",
    calling EndPrimitive() is optional.

    The function EndPrimitive() is equivalent to calling EndStreamPrimitive()
    with <stream> set to zero.

    A geometry shader starts with an output primitive containing no vertices
    for each stream.  When a geometry shader terminates, the current output
    primitive for each vertex stream is automatically completed.  It is not
    necessary to call EndPrimitive() or EndStreamPrimitive() for any stream
    where the geometry shader writes only a single primitive.

    Multiple vertex streams are supported only if the output primitive type is
    declared to be "points".  A program will fail to link if it contains a
    geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its
    output primitive type is not "points".


    Modify Section 9, Shading Language Grammar, p. 92

    !!! TBD !!!


GLX Protocol

    None.

Dependencies on ARB_gpu_shader_fp64

    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
    of implicit conversions supported in the OpenGL Shading Language.  If more
    than one of these extensions is supported, an expression of one type may
    be converted to another type if that conversion is allowed by any of these
    specifications.

    If ARB_gpu_shader_fp64 or a similar extension introducing new data types
    is not supported, the function overloading rule in the GLSL specification
    preferring promotion an input parameters to smaller type to a larger type
    is never applicable, as all data types are of the same size.  That rule
    and the example referring to "double" should be removed.


Dependencies on NV_gpu_shader5

    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
    of implicit conversions supported in the OpenGL Shading Language.  If more
    than one of these extensions is supported, an expression of one type may
    be converted to another type if that conversion is allowed by any of these
    specifications.

    This specification and NV_gpu_shader5 both lift the restriction in GLSL
    1.50 requiring that indexing in arrays of samplers must be done with
    constant expressions.  However, this extension specifies that results are
    undefined if the indices would diverge if multiple shader invocations are
    run in lockstep.  NV_gpu_shader5 does not impose the non-divergent
    indexing requirement.

    If NV_gpu_shader5 is supported, integer data types are supported with four
    different precisions (8-, 16, 32-, and 64-bit) and floating-point data
    types are supported with three different precisions (16-, 32-, and
    64-bit).  The extension adds the following rule for output parameters,
    which is similar to the one present in this extension for input
    parameters:

       5. If the formal parameters in both matches are output parameters, a
          conversion from a type with a larger number of bits per component is
          better than a conversion from a type with a smaller number of bits
          per component.  For example, a conversion from an "int16_t" formal
          parameter type to "int"  is better than one from an "int8_t" formal
          parameter type to "int".

    Such a rule is not provided in this extension because there is no
    combination of types in this extension and ARB_gpu_shader_fp64 where this
    rule has any effect.


Dependencies on ARB_sample_shading

    This extension builds upon the per-sample shading support provided by
    ARB_sample_shading to provide several new capabilities, including:

      * the built-in variable gl_SampleMaskIn[] indicates the set of samples
        covered by the input primitive corresponding to the fragment shader
        invocation; and

      * use of the "sample" qualifier on a fragment shader input forces
        per-sample shading, and specifies that the value of the input be
        evaluated per-sample.

    There is no interaction between the extensions, except that shaders using
    the features of this extension seem likely to use features from
    ARB_sample_shading as well.


Dependencies on ARB_texture_gather

    This extension builds upon the textureGather() built-ins provided by
    ARB_texture_gather to provide several new capabilities, including:

      * allowing shaders to select any single component of a multi-component
        texture to produce the gathered 2x2 footprint;

      * allowing shaders to perform a per-sample depth comparison when
        gathering the 2x2 footprint using for shadow sampler types;

      * allowing shaders to use arbitrary offsets computed at run-time to
        select a 2x2 footprint to gather from; and

      * allowing shaders to use separate independent offsets for each of the
        four texels returned, instead of requiring a fixed 2x2 footprint.

    Other than the fact that they provide similar functionality, there is no
    interaction between the extensions.

    Since this extension requires support for gathering from multi-component
    textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB
    is increased to 4.


Errors

    INVALID_OPERATION is generated by GetProgram if <pname> is
    GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked
    successfully, or does not contain objects to form a geometry shader.


New State

    Add the following state to Table 6.40, Program Object State, p. 378

                                                    Initial
    Get Value                 Type   Get Command     Value     Description                  Sec.  Attribute
    ------------------------- ----  ------------    -------    -------------------------   ------  -------
    GEOMETRY_SHADER_           Z+    GetProgramiv      1       number of times a geometry  6.1.16    -
      INVOCATIONS                                              shader should be executed
                                                               for each input primitive

New Implementation Dependent State

                                               Min.
    Get Value               Type  Get Command  Value  Description                  Sec.      Attrib
    ----------------------  ----  -----------  -----  --------------------------   --------  ------
    MAX_GEOMETRY_SHADER_     Z+   GetIntegerv   32    maximum supported geometry   2.16.4      - 
      INVOCATIONS                                     shader invocation count
    MIN_FRAGMENT_INTERP-     R    GetFloatv    -0.5   furthest negative offset     3.12.1      -
      OLATION_OFFSET                                   for interpolateAtOffset()
    MAX_FRAGMENT_INTERP-     R    GetFloatv    +0.5   furthest positive offset     3.12.1      -
      OLATION_OFFSET                                   for interpolateAtOffset()
    FRAGMENT_INTERPOLATION_  Z+   GetIntegerv    4    supixel bits for             3.12.1      -
      OFFSET_BITS                                      interpolateAtOffset()
    MAX_VERTEX_STREAMS       Z+   GetInteger     4    total number of vertex       2.16.4      -
                                                       streams

    (Note:  The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB,
     added by ARB_texture_gather, is increased to 4.)

Issues

    (1) This extension builds on the capability provided by
        ARB_sample_shading, adding a new built-in variable for the input
        sample mask.  It seems likely that a shader using this mask might also
        want to use one or more ARB_sample_shading built-ins.  Are such
        shaders required to include #extension lines for both extensions?

      UNRESOLVED:  It would be nice if it wasn't required.

    (2) How do the per-sample shading features of this extension interact with
        non-multisample rendering?

      RESOLVED:  Non-multisample rendering (due to no multisample buffer or
      MULTISAMPLE disabled) is treated as single-sample rendering.

    (3) This extension lifts the restriction requiring that indices into
        samplers be constant expressions, but makes the results undefined if
        the indices used would diverge in lockstep execution.  What is this
        good for?

      RESOLVED:  This allows shaders to index into samplers using integer
      uniforms, or with non-divergent values computed at run-time (e.g., loop
      counters).  Many implementations of this extension will be SIMD, running
      multiple shader invocations at once, and some implementations may have
      difficulty with accessing multiple textures in a single SIMD
      instruction.

      Note that the NV_gpu_shader5 extension similarly lifts the restriction
      but does not require non-divergent indexing.

    (4) What sort of implicit conversions should we support in this and
        related extensions?

      RESOLVED:  In GLSL 1.50, we have implicit conversion from "int" and
      "uint" to "float", as well as equivalent conversions for vector type.
      One of the primary motivations of this feature is to allow constants
      that are nominally integer values to be used in floating-point contexts
      without requiring special suffixes.  The following code compiles
      successfully in GLSL 1.50.

        float square(float x) {
          return x * x; 
        }
        float f = 0;
        float g = f * 2;       
        float h = square(3); 

      The same code would fail on GLSL 1.1, because "0", "2", and "3" would
      need to be written as "0.0", "2.0", and "3.0", respectively.

      This extension adds implicit conversions from "int" to "uint" to allow
      for cases like:

        uint square(uint x) {
          return x * x;
        }
        uint v = square(2);

      This code is legal with this extension, but not in GLSL 1.50 ("2" would
      need to be replaced with "2U" or "uint(2)").

      ARB_gpu_shader_fp64 adds a new type "double", and we extend existing
      implicit conversions to allow for promotion of "int", "uint", and
      "float" to "double".

      Unlike C/C++, the general rule for implicit conversions in GLSL is that
      conversions are unidirectional.  If type A can be implicitly converted
      to type B, type B can not be converted to type A.

    (5) Increasing the number of available implicit conversions means that
        there is the possibility of ambiguities in various operators?  How do
        we deal with these cases?

      RESOLVED:  For binary operators, the new implicit conversions mean that
      there may be multiple ways to resolve an expression.  For example, in
      the following declaration

        int i;
        uint u;

      the expression "i+u" could be resolved either by implicitly converting
      "i" to "uint", or by implicitly converting both values to either "float"
      or "double".  To resolve, we define a set of preferences for a common
      data type based on the types of the operands:

        - use a floating-point type if either operand is floating-point
        - use an unsigned integer type if either operand is unsigned
        - use a signed integer type otherwise

      If conversions to multiple precisions are supported, the
      lowest-precision available data type is preferred (e.g., int*float will
      be converted to float*float and not double*double).

      These rules should extend naturally if new basic data types are added.

    (6) Increasing the number of available implicit conversions means that
        there is an increased possibility of ambiguity when function
        overloading is involved?  Additionally, this and related extensions
        add new function overloads?  How do we deal with these cases?

      RESOLVED:  The general rule for function overloading in GLSL 1.50 is
      that we first check for a function prototype that exactly matches the
      parameters passed to a function call.  If no match exists, we check for
      prototypes that can be matched by implicit conversions.  If more than
      one matching prototype can be matched by conversion, the function call
      is considered ambiguous and results in a complication error.

      Unfortunately, when adding new implicit conversions, it is possible for
      cases that were formally unambiguous to become ambiguous.  For backward
      compatibility purposes, it would be desirable to ensure that shaders
      that succeeded in old language versions should still compile if
      "upgraded" to more recent versions/extensions.  However, the new
      conversions and overloads might make this more difficult without
      modifying other language rules.  For example, the following prototypes
      are available for the standard built-in function min() on scalar values
      when this extension and ARB_gpu_shader_fp64 are supported:

        int     min(int a, int b);
        uint    min(uint a, uint b);
        float   min(float a, float b);
        double  min(double a, double b);

      In GLSL 1.50, a function call such as:

        float f;
        min(f, 1);

      would be considered unambiguous because the double-precision version of
      min() didn't exist and the call matched only the single-precision
      version.  However, with double-precision, implicit conversions can be
      used to resolve to either the single- or double-precision versions.

      To resolve this issue, we provide a set of rules that can be used to
      resolve multiple candidates to a "best match".  The rules for
      determining a best match are similar to those for C++ function
      overloading, but not exactly the same.  Like C++, these rules compare
      the conversions required on an argument-by-argument basis.  A function
      prototype A is better than function prototype B if:

        - A is better than B for one or more arguments
        - B is better than A for no arguments

      If a single function prototype is better than all others, that one is
      used.  Otherwise, we get the same ambiguity error as on previous GLSL
      versions.

      As far as argument-by-argument comparisons go, the order of preference
      is:

        - favor exact matches
        - prefer "promotions" (float->double) to other conversions
        - prefer conversions from int/uint to float over similar conversion to
          double 

      If none of the rules apply, one match is considered neither better nor
      worse than the other.

      With these rules, the "min(f,1)" example above resolves to the "float"
      version, as is the case in GLSL 1.50.  However, there are other cases
      where ambiguity remains.  For example, consider the prototypes:

        int f(uint x);
        int f(float x);

      With GLSL 1.50 rules, "f(3)" would match the floating-point version, as
      no implicit conversions existed from "int" to "uint".  With the new
      implicit conversions, both prototypes match and neither is preferred.
      Because of the ambiguity, "f(3)" would fail to compile with this
      extension enabled, but should still compile on implementations
      supporting this extension if the extension is not enabled in GLSL source
      code.

    (7) The function overloading rules described in this extension describe
        conversions between data types with different sizes, however all
        existing data types allowing implicit conversion (int, uint, float)
        are the same size?  Why do we specify these rules?

      RESOLVED:  This extension is specified at the same time as the related
      ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such
      types.  The rules are specified all in one place here so we don't have
      to replicate and extend the rules in the other extensions.  It also
      provides the ability to automatically convert from signed to unsigned
      integer types, as in the C programming language.

    (8) Should we support textureGather() for rectangle textures
        (sampler2DRect)?  They aren't in ARB_texture_gather.

      RESOLVED:  Yes.

    (9) How does the input sample mask interact with the fixed-function
        SampleCoverage and SampleMask state?  Will samples be removed from the
        input mask if they would be eliminated by these masks in the
        per-fragment operations?

      UNRESOLVED.

    (10) Should we support reading patches as geometry shader inputs, and if
    so, where?

      RESOLVED:  Not in this extension.  This capability will be provided in
      NV_gpu_shader5.

    (11) Should we support per-sample interpolation of attributes?  If so,
         how?

      RESOLVED.  Yes.  When multisample rasterization is enabled, qualifying
      one or more fragment shader inputs with "sample" will force per-sample
      interpolation of those attributes.  If the same shader includes other
      fragment inputs not qualified with sample, those attributes may be
      interpolated per-pixel (i.e., all samples get the same values, likely
      evaluated at the pixel center).

    (12) Should we reserve "sample" as a keyword for per-sample interpolation
    qualifiers, or use something more obscure, such as "per_sample"?

      RESOLVED:  This extension uses "sample".

    (13) What should be the base data type for the bitCount(), findLSB(), and
         findMSB() functions -- signed or unsigned integers?

      RESOLVED:  These functions will return signed values, with -1 returned
      by findLSB/findMSB if no bit is found.  Note that the shading language
      supports implicit conversions of signed integers to unsigned, which
      makes it easy enough if an unsigned result is desired.

    (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words
         while most of the other built-ins start with a lower-case (e.g.,
         emitVertex)?  Which precedent should the new per-vertex stream emit
         and end primitive functions follow?

      RESOLVED:  The inconsistency began with the original functions in
      EXT_geometry_shader4; the spec author can't recall the original reasons
      (if any).  Regardless, we decided to match the existing functions as
      closely as possible and use EmitStreamVertex() and EndStreamPrimitive().

    (15) How do the textureGather functions work with sRGB textures?

      RESOLVED:  Gamma-correction is applied to the texture source color
      before "gathering" and hence applies to all four components, unless the
      texture swizzle of the selected component is ALPHA in which case no
      gamma-correction is applied.

    (16) How should we support arrays of uniform blocks (i.e., multiple blocks
         in a group, each backed by a separate buffer object)?

      RESOLVED:  We will use instance names in the block definitions, which
      can be declared as regular arrays:

        uniform UniformData {
          vec4 stuff;
        } blocks[4];

      These four blocks used will be referred to as "block[0]" through
      "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]"
      in the OpenGL API code.  The block member in this example will be
      referred to as "UniformData.stuff" in the API.  A similar approach was
      already adopted in GLSL 1.50, where geometry shaders supported arrays of
      input blocks that were treated similarly.  Since this spec depends on
      GLSL 1.50, little new spec language is required here.

    (17) What are instanced geometry shaders useful for?

      RESOLVED:  Instanced geometry shaders allow geometry programs that
      perform regular operations to run more efficiently.

      Consider a simple example of an algorithm that uses geometry shaders to
      render primitives to a cube map in a single pass.  Without instanced
      geometry shaders, the geometry shader to render triangles to the cube
      map would do something like:

        for (face = 0; face < 6; face++) {
          for (vertex = 0; vertex < 3; vertex++) {
            project vertex <vertex> onto face <face>, output position
            compute/copy attributes of emitted <vertex> to outputs
            output <face> to result.layer
            emit the projected vertex
          }
          end the primitive (next triangle)
        }

      This algorithm would output 18 vertices per input triangle, three for
      each cube face.  The six triangles emitted would be rasterized, one per
      face.  Geometry shaders that emit a large number of attributes have
      often posed performance challenges, since all the attributes must be
      stored somewhere until the emitted primitives.  Large storage
      requirements may limit the number of threads that can be run in parallel
      and reduce overall performance.

      Instanced geometry shaders allow this example to be restructured to run
      with six separate invocations, one per face.  Each invocation projects
      the triangle to only a single face (identified by the invocation number)
      and emits only 3 vertices.  The reduced storage requirements allow more
      geometry shader invocations to be run in parallel, with greater overall
      efficiency.

      Additionally, the total number of attributes that can be emitted by a
      single geometry shader invocation is limited.  However, for instanced
      geometry shaders, that limit applies to each of <N> invocations which
      allows for a larger total output.  For example, if the GL implementation
      supports only 1024 components of output per invocation, the 18-vertex
      algorithm above could emit no more than 56 components per vertex.  The
      same algorithm implemented as a 3-vertex 6-invocation geometry program
      could theoretically allow for 341 components per vertex.

    (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a
         non-constant stream number?

      RESOLVED:  Not in this extension.  Requiring a constant stream number
      for each call simplifies code generation for the compiler.

    (19) Are there any restrictions on geometry shaders with multiple output
         streams?

      RESOLVED:  Yes, such geometry shaders are required to generate points;
      line strip and triangle strip outputs are not supported.

    (20) Since multi-stream geometry shaders only support points, why does
         EndStreamPrimitive() exist?  Neither it nor EndStream() does anything
         useful when emitting points.

      RESOLVED:  This function was added for completeness, and would be useful
      if the requirement for emitting points were lifted by a future
      extension.

    (21) Should we provide mechanisms allowing shaders to examine or set the
         bit representation of floating-point numbers?

      RESOLVED:  Yes, we will provide functions to convert single-precision
      floats to/from signed and unsigned 32-bit integers.  The
      ARB_gpu_shader_fp64 extension will provide similar functionality for
      double-precision floats.  We chose to adopt the Java naming convention
      here -- converting a single-precision float to/from a signed integer is
      accomplished by the functions floatBitsToInt() and intBitsToFloat().

      Note that this functionality has also been forked off into a separate
      extension (ARB_shader_bit_encoding) that can be exported on
      implementations capable of performing such conversions but not capable
      of the full feature set of this extension and/or OpenGL 4.0.

    (22) What is the "precise" qualifier good for?

      RESOLVED:  Like "invariant", "precise" provides some invariance
      guarantees is useful for certain algorithms.

      With an output position qualified as "invariant", we ensure that if the
      same geometry is processed by multiple shaders using the exact same
      code, it will be transformed in exactly the same way to ensure that we
      have no cracking or flickering in multi-pass algorithms using different
      shaders.

      With "precise", we ensure that an algorithm can be written to produce
      identical results on subtly different inputs.  For example, the order of
      vertices visible to a geometry or tessellation shader used to subdivide
      primitive edges might present an edge shared between two primitives in
      one direction for one primitive and the other direction for the adjacent
      primitive.  Even if the weights are identical in the two cases, there
      may be cracking if the computations are being done in an order-dependent
      manner.  If the position of a new vertex were provided by evaluation the
      function f() below with limited-precision floating-point math, it's not
      necessarily the case that f(a,b,c) == f(c,b,a) in the following code:

          float f(float x, float y, float z) 
          {
            return (x + y) + z;
          }

      This function f() can be rewritten as follows with "precise" and a
      symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a).

          float f(float x, float y, float z)
          {
            // Note that we intentionally compute "(x+z)" instead of "(x+y)"
            // here, because that value will be the same when <x> and <z> 
            // are reversed.
            precise float result = (x + z) + y;
            return result;
          }
      
          (a + b) + c == (c + b) + a

      The "precise" qualifier will disable certain optimization and thus
      carries a performance cost.  The cost may be higher than "invariant",
      because "invariant" permits optimizations disallowed by "precise" as
      long as the compiler ensures that it always optimizes in the exact same
      manner.

    (23) What computations will be affected by the "precise" qualifier, and
         what computations aren't?

      RESOLVED:  We will ensure precise computation of any expressions within
      a single function used directly or indirectly to produce the value of a
      variable qualified as "precise".

      We chose not to provide this guarantee across function boundaries, even
      if the results of a function are used in the computation of an output
      qualified as "precise".  Algorithms requiring the use of "precise" may
      have a mix of computations, some required to be precise, some not.  This
      function boundary rule may serve to limit the amount of computation
      indirectly forced to be precise.

      Additionally, the subroutine rule permits non-precise sub-operations in
      a computation required to be precise.  For example, a shader might need
      to compute a "precise" position by taking a weighted average as in the
      following code:

        precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]);

      However, if the main precision requirement is that the same result be
      generated when <p> and <w> are reversed, the following code also gets
      the job done, even if posmad() is implemented with multiply-add
      operations.

        vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; }
        precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) +
                            posmad(p[3], w[3], p[2]*w[2]));

      To generate precise results within a function, the function arguments
      and/or temporaries within the function body should be qualified as
      "precise" as needed.

      Note that when applying "precise" rules to assignments, indirect
      application of this rule applies on an assignment-by-assignment basis.
      In the following perverse example:

        float a,b,c,d,e,f;
        precise float g;
        f = a + b + c;
        ...
        f = c + d + e;
        g = f * 2.0;

      The first assignment to <f> need not be treated as "precise", since the
      value assigned will have no effect on the final value of the
      precise-qualified <g>.  The second assignment to <f> must be evaluated
      precisely.  The fact that one assignment to a variable needs to be
      treated as precise does not mean that the variable itself is implicitly
      treated as "precise".

    (24) Are "precise" qualifiers allowed on function arguments?  If so, what
         do they mean?  Can a return value for a function be declared as
         precise?

      RESOLVED:  Yes; the rules permit the use of "precise" on any variable
      declaration, including function arguments.  The code

        float f(precise in vec4 arg1, precise out vec4 arg2) { ... }

      specifies that any expressions used to assign values to <arg1> or <arg2>
      within f() will be evaluated as a precise manner.

      Expressions used to derive the value passed to the function f() as
      <arg1> will be treated as precise according to the normal rules.  The
      expression for <arg1> is treated as precise if and only if the function
      call is on the right-hand side of an assignment to a variable qualified
      as "precise" or is indirectly used in an assignment to such a variable.
      It is not automatically treated as precise just because the formal
      parameter <arg1> is qualified with "precise".  

      For the purposes of this rule, variables passed as "out" parameters do
      not count as assignments.  Values assigned to an output parameter will
      not be evaluated precisely just because the caller provides a variable
      qualified as "precise".  When the output parameter itself is qualified
      as "precise", precise evaluation of that output is required within the
      callee.

      We chose not to permit function return values to be qualified as
      "precise", though we could have hypothetically allowed code such as:

        precise float f(float a, float b, float c) { return (a+b)+c; }

      To obtain a precise return value in such a case, use code such as:

        float f(float a, float b, float c) 
        {
          precise float result = (a+b) + c;
          return result;
        }

    (25) How does texture gather interact with incomplete textures?

      RESOLVED:  For regular texture lookups, incomplete textures are
      considered to return a texel value with RGBA components of (0,0,0,1).
      For texture gather operations, each texel in the sampled footprint is
      considered to have RGBA components of (0,0,0,1).  When using the
      textureGather() function to select the R, G, or B component of an
      incomplete texture, (0,0,0,0) will be returned.  When selecting the A
      component, (1,1,1,1) will be returned.


Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  -----------------------------------------
    16    03/30/12  pbrown    Fix typo in language restricting the use of
                              EmitStreamVertex()/EndStreamPrimitive() to 
                              programs with an output primitive type of 
                              points, not an input type of points (bug 8371).

    15    10/17/11  pbrown    Fix prototypes for textureGather and
                              textureGatherOffset to use vec2 coordinates for
                              "2DRect" sampler versions (bug 7964).

    14    01/27/11  pbrown    Add further clarification on the interaction
                              of texture gather and incomplete textures (bug
                              7289).

    13    09/24/10  pbrown    Clarify the interaction of texture gather
                              with swizzle (bug 5910), fixing conflicts
                              between API and GLSL spec language.
                              Consolidate into one copy in the API 
                              spec.

    12    03/23/10  pbrown    Update issues section, both fixing/numbering
                              existing issues and including other issues 
                              that were left behind in NV_gpu_shader5 when the
                              specs were refactored.

    11    03/23/10  Jon Leech Describe <offset> to interpolateAtOffset
                              without implying it is a constant expression
                              (Bug 6026).

    10    03/07/10  pbrown    Fix typo in an output stream qualifier example.

     9    03/05/10  pbrown    Modify function overloading rules to remove
                              most preferences when converting between
                              two different types.  The only preferences
                              that remain are promoting "float" to "double"
                              over other conversions, and preferring 
                              conversion of integers to "float" to converting
                              to "double" (bug 5938).
                              
     8    01/29/10  pbrown    Update the spec to require that the minimum
                              value for MAX_PROGRAM_TEXTURE_GATHER_-
                              COMPONENTS is 4 (bug 5919).

     7    01/21/10  pbrown    Clarify the rules for determining a best match
                              if implicit conversions can result in multiple
                              matching function prototypes.  Modify the rules
                              to pick a best match by comparing pairs of
                              functions, and using any function deemed better
                              than any other choice.  Modify the argument
                              conversion preference rules for overloading to
                              disfavor "int" to "uint" conversions, for
                              backward compatibility with previous GLSL
                              versions.  Add some new discussion of the
                              choices involved to the issues section (bug 
                              5938).

     6    01/14/10  pbrown    Minor wording updates from spec reviews.

     5    12/10/09  pbrown    Functionality updates from spec review:
                              Rename fmad to fma.  Fix error in spec
                              language for negative diffs in usubBorrow.

     4    12/10/09  pbrown    Convert from EXT to ARB.

     3    12/08/09  pbrown    Miscellaneous fixes from spec review:  Added
                              missing implementation constants for
                              interpolation offset range and granularity;
                              added explicit section to OpenGL spec describing
                              shader requested interpolation modifiers and
                              functions.  Clean up more dangling "ThreadID"
                              references.  General typo fixes and language
                              clarifications.

     2    10/01/09  pbrown    Renamed gl_ThreadID to gl_InvocationID.

     1              pbrown    Internal revisions.
