- Dave Morrison <drmorris@mit.edu> - 2/3/94 -
updates for 4.9.3 by Chris Davis <ckd@kei.com> 6 Jun 1994

Changes to the shared library setup have lots of little pitfalls and mines.
This is an attempt to map the minefield, for those who feel they've
noticed something that they think should be done another way.


* What's shared, what's static

The purpose of these modifications to Sun's libc.so is to provide DNS
lookup for gethostby* and if you desire, getnetby*.  This involves
replacing the following SunOS libc routines.

	gethostbyname			getnetbyname
	gethostbyaddr			getnetbyaddr
	gethostent			getnetent
	sethostent			setnetent
	endhostent			endnetent

The routines use the res_* routines from the resolv library to get their
information from DNS.  Because it is most convenient, all these objects
are linked into the shared library, meaning they are linkable without
using -lresolv.  Full details are given below, and unless you want to
get into the nitty gritty, obey the following rule.

Anything which uses -lresolv rountines other than the stock OS routines
above should link using -lresolv.

The symptom of not obeying this rule is finding that _res is unresolved
at link time.


* global variable collision

The global variable _res is particularly troublesome.  Any executables
which was compiled with -lresolv before the shared library was installed
has in it _res staticly compiled in as a global data structure.
Unfortunately, the resolv library in 4.9.2 BIND has a global variable
_res, and it is defined slightly differently.  At run time, when the
shared libraries are loaded up, some linking is done at runtine.  The
runtime linker, notices that _res is staticly defined and does not link
in the dynamic version.  This means that if the shared libc calls ever
call called from this executable, they would overwrite the static
version.  Since the static version is a smaller data structure, this
could overwrite bits of memory.  Not good.  It turns out the worst case
is not a likely scenario, but I'd rather be safe then sorry.

This is why the Makefile for contrib/sunlibc does -D_res=_res_shlib.
The collision is removed.  This means that _res is not accessable as a
global variable in the shared libc library.  To compile a program which
accesses _res directly, libresolv must linked in staticly.

This would not be a problem if you could recompile any code which used
libresolv.  This would mean recompiling some of SunOS and perhaps other
vendor code if you've obtained additional software.  Since people don't
generally have the source to everything on the machine, this isn't a
viable option except for Sun and miscellaneous wizards.


* Having named and tools linked with a shared libc.

It is very tempting and almost doable to compiled the entire bind
distribution with a resolv in a shared libc.  There are dangers
associated with doing this.  First, there's the global variable
collision problem mentioned above.  Second, there's a problem of
maintaining the the shared library version control.

People have a tendency to copy tools like dig or the named server from
machine to machine.  If the new shared library (the one with *this*
distributions resolv) is not present on the machines to which these
goodies are copied TO, the user will be getting SUN'S copy of resolv.
This could cause you to lose most heinously, and you will spend DAYS if
not WEEKS trying to figure out what the problem is.  It's debatable if
there's even a performance improvement by doing the sharing.  Compare
that to the debugging and frustration time you are going to spend.

You also will need to replace libc everywhere when a new release when
new releases come out.  This isn't as big an issue for a production
release of bind, but for the alpha test team, it means a few less things
to worry about, when there is already plenty to worry about.

Again, if you could recompile the machine, there wouldn't be a problem.
Vendors should release the tools and server shared, as they already have
the assurance that there is a standard libc, and users may want to
handle some problem routines by relinking the shared library.


* shared archives

In addition to a shared object (the libc.so files) which contain the
executable libc code, there is also a shared archive (the libc.sa
files).   The shared archive contains global initialized data.  When a
program is linked, if it accesses any of this global initialized data,
that data is included from the shared archive in the final executable.
Some examples include errno (intialzed to zero), the ctype.h tables,
sys_errlist, and _iob for stdio.

If this data is not accessable from a shared archive, but is accessable
from the shared object (e.g. no libc.sa.x.y.z exists for libc.so.x.y.z),
the shared object copy will be used, but not linked into the executable.
This results in a performance hit for executables which used that data.
Sun's documentation claims this to be possibly degrading to the system
as a whole on a heavily used library.  I have yet to observe anything
besides a slight (max 10%) performance hit.

This is why it is important to copy+ranlib the old libc.sa.a.b.c, when
creating a new libc.so.x.y.z.  Sun's instructions in building a new
shared libc (shlib.etc package or patch) neglect to mention this.

There are 5 instances of global initialized data in -lresolv.  They are
_res (renamed to _res_shlib), _res_resultcodes, _res_opcodes, h_errlist,
and h_nerr.  In principle, they should be added to libc.sa.x.y.z.
However, long as they are never referenced, it does not matter that they
are not there.  Programs which use these variables should link with
-lresolv to get the static version, and the problem is solved.

The reason for not including them in the shared archive, is that there
is a potential problem in that if this global data ever changed, as it
might in a future bind release, the MAJOR version of the library should
change.  By using the static versions with -lresolv, you allow yourself
the option to upgrade the -lresolv code without major fuss.

Update: in 4.9.3, the resolver library no longer uses initialized static
data, so this should never be a problem again.  (You should still copy and
re-ranlib the Sun-supplied libc.sa, however.)


* shared library revision numbers

Technically, the shared library changes are sufficient enough to warrent
a minor revision change.  On SunOS 4.1.3, this would mean the shared
library should be numbered libc.so.1.9.  However, if Sun could used that
in another SunOS release, and you went to upgrade, suddenly there would
be two libc.so.1.9's.  Programs would be compiled to use "libc.so.1.9"
and would be no distinction between those which want to use the SunOS
libc.so.1.9 and those which want the locally compiled libc.so.1.9.  At
this point, the locally compiled libc.so.1.9 should really be 1.10, and
you have to recompile everything you originally compiled, anyway.

So, stick with libc.so.1.8.x++.  Just be aware that if you compile on a
machine with this new shared library, and you use the res_ routines
directly without -lresolv (uncool, see above) you will not be able to
take it to a previous stock SunOS without a few problems.

Update: SunOS 4.1.3_U1 (aka Solaris 1.1.1) does, in fact, use libc.so.1.9.


* Compiling with gcc

Compiling resolv with gcc is highly preferably is it understands the
concept of making read only data shared.  Sun's 4.1.3 cc doesn't (simply
to make read-only strings shared takes some nasty effort).

Currently (4.9.2 resolv and gcc 2.5.8), the resolv library uses does not
create any special gcc references.  Specifically, there are no
unresolved references in the resolv objects, that are present in
libgcc.a.  This means that even if you compile with gcc, the objects
created may be linked with any compiler.  All is cool, use gcc.

SHOULD THIS CHANGE (in a new release of gcc, or resolv - not likely to
change, but possible), you can still use gcc and create objects usable
by any compiler.  You will need to add libgcc.a to the shared library
link line (before -ldl).

