NapShare README and Instruction Manual

Contents:

Introduction
Installation
Quick Start
Automation
Automation Filters
Advanced Stuff
How Push Works
Bug Reports
Variables
Developers, Programmers and Other Tech Info
Authors


Introduction
------------

NapShare is a automated Gnutella Client.

It's a fully featured automated client designed to be run 24/7 and
shares any type of file the user wishes to share.

It's a Unix clone, it needs GTK+ (1.2 or above). Gnome is not needed. It
is currently developed and tested under Linux (Mandrake) and also works
under KDE. It is known to run at least on Linux and FreeBSD (on 80x86
machines). It is released under the GNU Public License (GPL) and is
"open source". By using it you release the authors of NapShare from
any and all liablilty resulting from it's use. You use this program
at your own risk.

It uses GNU style "portable" routines, so should compile on any platform.

The NapShare Home Page is:
http://napshare.sourceforge.net

The Project Page is:
http://sourceforge.net/projects/napshare/

The NapShare Open Discussion Forum is at:
http://www.gnutellaforums.com/

For further information on Gnutella, try:
http://gnutella.wego.com

You may also find the "NapNews" newsreader program useful in obtaining
more files to share. See the home page for more info.


Installation
------------

You may need to obtain the latest version of Glade to compile this
without warning messages. See freshmeat.net or other Linux sites for
updates. Try it first, I pack it up ready to compile w/o invoking Glade.

Ignore the messages "const qualifier ignored on asm" if you get them,
this is fixed in glib 2.X and happens because it ignores a _const_
directive they simply removed in the later versions, meaning it's not
a problem.

See the file "INSTALL" for further information.



Quick Start
-----------

Operation is fairly straight forward. Default values are set for getting
started, unless you are behind a very strict firewall. This section
should get you started quickly.

You should have two directories created to store your files, "downloads"
and "done".

NapShare uses the GWebCache host cache system to kick start it when you
have no known Gnutella hosts to connect to.

You will need to copy the supplied file called "NAPS-urlcache.txt"
to your ".napshare" settings directory, or put it in the same directory
as napshare was started from. The file is a list of URLs for GWebCache
host cache sites. The supplied file may be old, to find new GWebCache
URLs do a Google or Yahoo search for "gwebcache". Keep about 50 good
URLs in this file. NapShare randomly picks the first one to start with.
A sample "NAPS-urlcache.txt" file is on the home page.

After you have connected once, you will have a reserve list of other hosts
and things will go faster next time.

Share some files! Some hosts will disconnect if you are not sharing
any files.

If you still don't get connected, you may need to check your Internet
connection and DNS (etc/resolv.conf) list. If your Mozilla or Konqueror
web browser isn't able to browse the web you have a problem with your
internet setup. Get that working first. If you are behind a firewall
and can browse, NapShare should work (but see the section on "push"
below). In normal operations you should be connected within about 30
seconds. Each host will supply a list of other current hosts so the list
is always updated.

Click on "Config" in the upper left hand box to get the Configuration
Screen, then select your downloads and done directories. Select the
"done" directory as "Paths to Files" also, this will share out any files
that are finished downloading to others on the network. The "downloads"
directory is where incomplete files are stored.

Please share your files! This is what Gnutella is all about. It doesn't
take much of your connection bandwidth because it's outgoing packets. It
also helps the network if you leave NapShare running even if you aren't
using it. 24/7 is even better!

Once you are connected to a few hosts (2 or 3 is good) you can perform
a search. Click on "Search" in the upper left hand box to get the Search
Screen, then type in a keyword or two separated by spaces. Gnutella uses
"AND" for keyword searches, so if you type two words both will have to
be in the file name to get a return. In most situations it takes about
30 seconds for search results to begin to be returned. Be patient! You
are searching about 10,000 hosts.

Once you see a file name you like, click on it and press the button at
the bottom "Downloads Selected Files" and the file will be queued for
download. Click on "Downloads" in the upper left hand box to switch
to the Downloads screen and see what's going on.

In the Config Screen, "Paths to Files" is the shared directory, these
files are the ones you will share to the network. You should always share
your done directory. When downloaded files are complete, they are
moved to the done directory and then can be uploaded by others.

The client is pretty smart and won't "hammer" a host, even if you select
all the files a host has, if the host is busy it will only try that host
every so often. Select as many files as you like. The more you select, the
more likely you will complete a large file.

If you click on the "Select all" checkbox at the bottom of the screen,
then click on a file it will automatically select all files in the list
with the same name and that is the same size or larger. This gives you a
better chance of completing a file if a host drops off the network.
Duplicate names are handled in the download manager, so don't worry
about that. It's better to wait a little bit for all the search results
to come in, then click on file names you like. The other choice is "By
size", it uses the file size to select files, ignoring the file name.
The file name that is used to store the file is the *first* one you
clicked on. This isn't perfect, but it will reduce downloading files
with the name changed only.

Since you used keywords to get a list of file names, that along with
the file size should select the exact same file even if the name
changed. The larger the file the less chance that the size is exactly
the same.

Once a file is in the "done" directory, unless you pick a larger file,
the download manager will not download that file again. Incomplete
files can be resumed from where you left off last time, NapShare uses
the filename and size to identify a file for resume, the name is case
sensitive (unlike windoze, so this does cause some problems). Again,
this isn't perfect.

Please don't use the Automation feature until after you manually
download files yourself for a while. You need to know what keywords to
use to pick & filter files and should understand all the little "quirks"
of the Gnutella Network and crazy file names you will find.

Most other clients can now resume, so if you shutdown NapShare with a
upload in progress that user will most likely find the rest of the file
on another host. If you get cut off while downloading a file, NapShare
will try the next same file in the queue till it completes.

There are more features, click around and get used to the program. Most
things make sense. Visit Gnutella web sites for more information on
Gnutella, the protocol and network etiquette.

If you are trying to connect to a private filesharing LAN, see
"Advanced" below.




Automation
----------

The Automation feature is designed to simulate as close as possible a
person searching and selecting files all night long. You provide a list
of search words, and filter words for it to use to pick files, and it
does all the work!

Automation is not designed to pick exact files to download, but will get
a range of file types and subjects you like. If you get files you don't
want that's OK because you share them back to help out the network. Plus
you didn't have to sit there all night!

Automation needs a configuration file called "NAPS-auto.txt" located
in the same place as your other NapShare configuration files. An example
file is created if there is none. To reload this file while NapShare is
running, click on the "Load from file" button. You can modify this file
using your favorite editor and reload it whenever you want.

To try it out, click on "Automation" in the upper left hand box to get
the Automation Screen, the configuration information should be already
loaded and displayed in the list.

The following may be a bit confusing at first, examples are provided
in the "Automation Filters" section.

In the configuration file you will find a keyword "list_search", this
set to the search text. Gnutella uses "AND" for keyword searches, so if
you type two words both will have to be in the file name to get a
return. This word will be used to send out a regular Gnutella search,
just like you did manually.

Another keyword "list_strings", contains words that you do not want in
the file name, separated by spaces. If NapShare finds any of those words
in a file name, it won't pick that file for download. This is the
default action if you don't put anything in "list_filters". Default is
to use the "not" filter. There are many filters you can use, see the
"Automation Filters" section below for more filter details.

NapShares' automation "brain" will start when you press "Automation
Start". All manual search buttons are disabled when automation is on.
You can still manipulate the downloads and connections as usual.

Remember, it's designed to work at this all night for many hours at
a time. It may seem to be slow if you watch it, but go away for a
few hours and see what you get. It's designed to be nice to the
network, there are some long delays between search sends.

Automation will wait 20 seconds to make sure there is a good Gnutella
network connection, then it will issue a search from the top of your
list and wait for file "hits".

When it gets a "hit", it checks the file info against your "strings" and
"filters" list of keywords. It queues any file that passes the tests.

After a set delay, if it doesn't get any results it will try the next
search in the list. There is also a upper limit on how many files it
will queue at any one time, and it will try to re-fill the queue when a
lower set limit is reached. It does quite a bit of work.

Automation won't queue a file that is already in the "done" directory
unless it's larger, then it will resume that file if possible. It won't
trash the network with searches, it issues only one at a time, and has a
set delay between searches. Since you run it all night, it has plenty of
time to try all searches in your list. It will also delay if it tries
all searches and queues nothing, so you aren't sending out worthless
searches all night. I am sure more features will be added to the
automation "brain" as time goes by to make it smarter.

It will take a while for you to get a collection of "SPAM" files in your
done directory. You should leave them so they aren't downloaded again.
Once you have a lot of files in your done directory, things will go
better, give it some time. Add more words to your search phrase and
use more filters to get what you want.

Please share your "done" directory back to the network. This helps keep
popular files available. Hit the "Rescan" button in the config
screen every once and a while to update the internal list.

The values for the delays are already set for most common searching.

The "search reissue" timeout value on the search screen is not valid
while automation is on. Automation uses it's own settings. Automation
disables any manual searches.




Automation Filters
------------------

There are many filters you can apply to each Automation generated hit.
Once you have gotten used to how Automation works, you can get more
advanced in using keywords to filter your file download choices.

The default filter is "not", meaning if any of the keywords entered into
"list_strings" are in the file name, reject the file.

You can also use "or", it's the same as "not" except that if any of the
keywords are in the file name, then queue the file. The filter "and"
works the same way except all words must be in the file name.

These include keywords within a word, like "time" would also detected
within the word "bedtime".

How do you use the "or" filter? Just place it's name in "list_filters"
and add some keywords into "list_strings". It would look something
like this:

list_strings = "mpg mp3"
list_filters = "or"
list_search = "linux sounds"

This would queue any file with "mpg" or "mp3" in it but no others. So
what happened to "not"? Now that you specified a filter, the "not"
filter is disabled and you have to manually enter it to use it.

Now we want to use both "not" and "or" filters, this is where things get
interesting. Here's an example:

list_strings = "mpg mp3///tramp green"
list_filters = "or not"
list_search = "linux sounds"

Automation will first test the file name using the "or" filter, and pass
the keywords "mpg mp3" *only* to it. If that passes the tests, then it
will test the file name using the "not" filter, and pass the keywords
"tramp green" *only* to it. The "///" separates the strings for use in
each assigned filter. You can stack filters and keywords up to the 4K
entry limit. That should allow some complex filters to be created. You
can call a filter more than once with different keywords. Filters are
always separated by spaces, but keywords are separated by whatever the
filter requires. Don't put in spaces where they are not needed or
required. Another more complex (but useless) example:

list_strings = "mp3///green people///purple"
list_filters = "or not or"
list_search = "linux sounds"

This passes "mp3" to "or", then passes "green people" to "not" then
passes "purple" to "or". A file can fail at any point and all further
testing is aborted.

Another filter is called "size". It's format is "minsize maxsize". You
enter numbers corresponding to the file minimum size wanted and it's
maximum size in K bytes, separated by a space. You can leave out maxsize
if you wish, and enter zero for no minsize test. Example:

list_strings = "1 5000"
list_filters = "size"
list_search = "linux sounds"

This would queue files that are larger than 1K and smaller than 5 megs.

Then you have "regex", this is a very versatile filter based on the regex
standard. It would take a lot of space here to explain regex, you can
find lots of documentation on regex online. Here's a example:

list_strings = "mpeg$"
list_filters = "regex"
list_search = "linux sounds"

This would only queue files with the letters "mpeg" at the end of the
file name. Use "^blue" for files that start with the letters "blue" at
the beginning of the name.

Regex uses lots of strange characters like ^$?+[-|\] and even '.'
(period), so be careful when using it. For those who need to know, regex
is set to REG_ICASE|REG_EXTENDED which means it will ignore case when
doing it's thing and it uses extended regular expressions.

More advanced users may want to try making a external program such as
a shell or perl script do some work. Example:

list_strings = "perl my_script.pl"
list_filters = "extern"
list_search = "linux sounds"

This will send the command line "perl my_script.pl" to the shell along
with a double quoted string containing the filename and other
information. Only experienced programmers should use this feature. It
takes a lot of manual debugging before you can use a script. Timing
is also important since you only have a short time to return back. This
should be the last filter you do, using all the builtin filters first.
See the sample perl filter scripts on the home page for more details.
A new MySql based filter script is available.

You can set variables "on the fly" using the "set" special filter. It
is a special filter that really doesn't filter at all, it just sets
variables that were set by the auto configuration file. This would allow
you to create a list of special files that have special timeout values.

list_strings = "queued_stop = 20, queued_start = 2"
list_filters = "set"
list_search = "linux sounds"

The above would change the configuration variables so that if 20 files
are queued it will stop the search and if only 2 files are left in the
queue it will start the search again. Each variable is comma separated
and since this is like a real filter, the variables are only changed
when a search result is received. You can place this filter anywhere a
normal filter is used. The variables stay this way till you change
them back. Variables that require quotes "" can't be changed.

Filters have a sort of built in help. If you put "???" as the first
characters in a keyword string, a short "debug" line will be printed for
that filter on stdout or the log file if it's enabled. You include the
keywords right after the ??? as usual so the filter can still work. This
will let you track down problems when setting up new filters. Example:

list_strings = "???mp3///???green people///purple"
list_filters = "or not or"
list_search = "linux sounds"

This would printout the "debug" message for the first "or" and the "not"
filter, but not the last one. The filters would operate as normal. You
can also always look over the source code in "auto.c" for info too.

In the automation config file you can turn on a debug mode that will
probably print lots of strange information, and there is a log feature.

More filters will probably be added by other people, patches will be
available for them at the project site.




Advanced Stuff
--------------

Here's some other info you may want to know after you have used NapShare
a while.

If you are not behind a firewall, always set the "Maximum of XX total
connections" to one more than the "Try to keep at least XX connections
up". This allows other people to make "incoming" connections to you and
helps network health. Your "outgoing" connections were courtesy of
someone who allowed incoming connections.

If you are behind a firewall you will share and get less files because
of connection limitations. You should open the port (default 6346) on
your firewall to get better results. If the port is not open, then
on the download screen, click on "never send a push request" to get
better performance, you can't accept incoming connections needed
for a push if your firewall is closed.

Try a "right click", on some screens there are useful options. In search
you can get rid of the tabs and use the pull down menu for multi search.
In downloads you can kill and remove a download.

Stop searches as soon as you are done selecting enough files, or set
the "reissue timeout" to a high value. Searches take up most of the
network resources. If you let NapShare run overnight, please check
that you have no active searches going.

Clicking on "size" in the search screen will sort the files by size.
"file" will sort by file name.

You can "stretch" the file column to show more of the file name and it's
size will be saved when you exit. The whole program window can be
stretched by clicking on the lower right hand corner.

GnutellaNet connections: "S" is "Sent", "R" is "Received", "D" is
"Dropped" and "B" is the Send Buffer size in K Bytes. Dropping a
message is a normal thing on Gnutella, they "die" after a set
Time To Live, or TTL. Slower nodes will show a larger Send Buffer,
or if you produce a lot of search results from your shared files, this
number will increase. It's regulated by "node_sendqueue_size" in the
config file. If sharing out files is more important to you than
downloading, increase this value.

A node with a high "B" send buffer size is slow in receiving gnutella
messages. Since your download speed is most likely faster than your
upload speed, and since you may be receiving and re-sending packets
from more than one other node, plus you could be uploading a file,
it could be due to your connection's upload speed limitations. No
need to worry, we send our search and other requests first as
"priority" packets. Slow nodes are still good to have around and
form the base of the Gnutella Network.

Your settings for "Max TTL" is the number where you will drop a message
and not pass it to other hosts. "My TTL" is the TTL number you attach
to your search packets. 5 is a reasonable number and helps keep network
traffic down. Think of it this way, you send your search to 3 hosts
that send it to 3 hosts each and those hosts send it to 3 hosts each,
it gets pretty big very fast. Use 7 max as most hosts will not pass
a packet with too high of a TTL to protect the network.

"Minimum connection speed" for searches applies to each search
individually. If you have several searches going at once, each one can
have it's own speed setting.

You can use the "Reflected IP" in the config screen to find out what
yours is at this time. It is reflected back by other clients you try to
connect to. The "System IP/Port" is what NapShare thinks your IP is. You
can use these to set NapShare up when behind a firewall or "NAT" router.

Uploads will pop on and off the screen, this is normal now due to new
clients that do "multi source downloading" and drop the connection after
getting a small part of a file. Only completed uploads stay on the screen,
or technically, the ones that read to the end of the file.

If you are on a private LAN network, you can change the connect headers
that NapShare sends out via the config file settings. You can monitor your
LAN packets with "tcpdump" (as root) to find out what header names
other nodes are using. You can change the vendor name and vendor
code if needed to establish a private connection. This was not completely
tested as of V1.2, the "LAN: yourname" header is sent if you enter a
"lan_name" into the config file. The headers sent are correct, what
a particular client will do with them is unknown. Please send feedback
and code patches if you have further info.

Filtering searches still doesn't work, it's disabled. Use the automation
filters, they work great.

In the automation config file you can set "node_kick_mode" to kick a
node after a while. Settings are 0=normal, 1=end of every list pass,
2=if previous search queued zero files, 3=after every search. If
a new node comes up, the latest running search will be sent to it.



How Push Works
--------------

A quick overview of how push works.

Push was created for hosts that are firewalled. Since firewalls allow
people behind them to *initiate* a connection, they connect to the
Gnutella Network first (Gnet), then wait for a file request to come to
them via the Gnet, and they initiate a direct connection back to the
IP address of the requesting person. Since they started it, the
firewall allows communication (both ways). The "push" packet sent
via Gnet contains the file name and IP:port address of the requesting
client program.

If you are behind a firewall, you can still browse the web because you
connect to a server and ask for it to send data back over that same
connection. Same with NapShare. If you ask to connect to a host for a
download, the host will send back the file to you over the same
connection, the firewall allows this. If someone wants a file from
you, they have to ask for it via Gnet packets. NapShare monitors all
network traffic looking for a "push" request directed at you. This
request packet contains a IP address to connect to and then NapShare
initiates a connection to that person and sends the file.

There are limitations to this, and some confusion over how it works
even among developers! So to clear things up a bit:

1. Two firewalled hosts can't "push" to each other.
2. Firewalled hosts shouldn't request a "push", they have no real IP to
   directly connect back to.
3. Files found that show a internal network address like 10.0.0.0 may
   still be available via push.
4. You don't need port 6346 "open" in order to use NapShare.
5. NapShare will select a random "outgoing" port number when initiating
   a connection, which then communicates both ways. That port could be
   3000, 5357 or any other number within reason. 6346 is there for other
   clients to initiate a connection with you.
6. You aren't helping the network health by hiding behind a firewall, Linux
   is very secure. Simple firewalls like "Shorewall" are now very easy
   to configure with tools like "webmin" or your distro's control panel.

If you allow outside connections to come in via port 6346 through your
firewall then it will be the same as if there was no firewall as far
as NapShare is concerned. Some firewalls can block known packet types
and prevent programs like this from communicating. Check with your
admin if you have any problems.

If you are behind a "NAT" or other firewall, NapShare will most likely
think your local IP is the internal one, like 192.168.X.X or 10.X.X.X,
so you have to "force local IP" in the config screen. You can use the
"Reflected IP" to find out what yours is at this time. It is reflected
back by other clients you try to connect to. The "System IP/Port" is
what NapShare thinks your IP is.

Some firewalls are simply a Linux box sitting in front of a bunch of
Windows machines. Sort of tells you something about OS's and security
doesn't it!

Proxy servers are another story.....



Bug Reports
-----------

This software will always be in a state of development, improvements
being added all the time. Stable version will be posted from time
to time, but your feedback is needed by developers in order to
make a great program!

Please post any Bug Reports to the Bug Manager on the sourceforge
project page.

To use the "gdb" debugger, at a terminal prompt you can type
cd directory/for/napshare
gdb napshare
handle SIGPIPE nostop
run



Variables
---------

The file "NAPS-auto.txt" contains the delay variables and the keywords
for automated searches. Some of the values are explained here further.
They can be edited when the program isn't running.

"queued_stop" - Automation will try to queue as many files as it can.
Since some searches return a lot of results, this is the upper limit
before it stops. It will start searching again when the queue reaches
the lower limit set by "queued_start" or most likely when it reaches the
"flush_timeout" time and the queue is emptied.

"search_time" - This prevents searches from being sent out one after the
other. 300 seconds is the lower limit (5 minutes). If you are testing,
stop and re-start automation and this timer will reset. Set it to 600
normally.

"search_timeout" - How long to wait after a search is issued and either
no results come back, or after all the results stop coming in. 60
seconds is the lower limit on this. Set it to 180 normally (3 minutes).

"flush_timeout" - File data gets old, connections change, so a new
search should be issued after a hour or so. This purges old files in the
queue if they sit there too long. It doesn't affect any running
downloads. Only works when automation is on.




Developers, Programmers and Other Tech Info
-------------------------------------------

The following information is for Developers only.

Run multi copies on the same computer:
If you simply do a "make" rather than a "make install" a version of
NapShare will appear in your "src" directory. Create a new
directory called "test1", make a copy of the config file found in
your home dir/.napshare called "NAPS-settings.txt" and place it
into the "test1" directory. Place the compiled version of NapShare
into that directory, add a "download" and "done" directory,
and from a shell prompt type "./napshare". The program will check
for the local config file first and use it. The hosts file will now
be read and saved to the local directory. If you change the port number
in the config file, you can run as many copies as you want from
different directories. You can also have them connect to each other
by using IP 127.0.0.1:6346 (or whatever port). If you set the config
variable "stop_host_get" the client will not try to connect to any
outside hosts, thus you can manually connect to your local IP:port
of the other client you have running for testing. It won't timeout
the host because of inactivity like it normally would.

Debug variable use:
A config variable called "dbg" is used to set debug information printed to
the terminal window. It has 5 levels, the higher the level, the more
information is printed. Don't set it to 5 unless you like to see way
too much information! Data such as raw search packets are shown at that
level. Small warnings and other stuff is found at level 1 or 2.

Developer info:
Feel free to sign up and participate in the mail list. To get up to
speed, you might want to read some of the past messages in the archives
and see what we have been up to lately.
Watch out when modifying code that you un-allocate any allocated
memory. The function "strdup()" will allocate memory that needs to be
"free()" in some cases. malloc is used several times in the program.
When editing the code, please turn off any options in your editor that
replaces tabs with spaces. We use tabs that create 3 spaces in our source
files. And turn off line wrapping!
Please use the patch manager found on the main development page to post
any patches you may have come up with. It allows you to upload the patch
file as a binary. Posting a patch to the email list usually wraps lines
and messes up a patch file. You should let everyone on the mail list
know that you posted a patch to the patch manager.
If you find bugs, please use the bug manager on sourceforge (or fix them!).
Once you get a copy from anon CVS (see the CVS page for instructions),
and you make some modifications, from a shell make sure you cd into the
"napshare" directory, type "cvs update -A" and fix any conflicts so
you have a current cvs version plus your changes, then type
"cvs diff -uD "now" > diff-TODAY.patch" (replace "TODAY" with month-day)
and a patch file will be created, referencing the current CVS version.
You can then upload this file so others can patch their version with
your changes. We use the "unified" format for patches, thus the "-u".
If your patch requires other patches, you should include them also.
(this happens automatically if you patched your own code already)
Before starting a big project, you may want to ask the list if anyone
else is working on that fix so you don't duplicate work.
Good patches will always be included in the next release, but no promises.



Authors
-------

See the file "AUTHORS" for more info.
NapShare was originally, and is based on source code from V0.13
Gtk-Gnutella, and further contributions to both NapShare and
Gtk-Gnutella by other AUTHORS listed in the file.

