User's Guide to the Biotella Entomological Database Application

by Hannu Saarenmaa

1. Introduction

Biotella is the name of the database application described in this document.  Its purpose is to capture and share entomological observation data.

Biotella provides the basic functionality that is needed by a field entomologist, such as recording of sites, observations, sightings, and book-keeping of collection specimens. It also provides for export and import to and from central data warehouses on Internet.

This document describes how to acquire, use, and maintain the application.

2. Development, platform, availability, and support

Some history first. I started developing something like Biotella already in 1981 together with Erkki Kaila and Jorma Kyrki [Saarenmaa et al. 1984].  It was originally written in Fortran for a PDP-11/34 minicomputer.

In 1993, Biotella was ported to its present platform, which is the Microsoft Access data base management system.  Today Biotella runs on and requires Microsoft Access.  You must have this commercial program in order to use Biotella.

Biotella is developed continuously as a private programming project.  New versions are made and released as needed and enough important improvements have accumulated.  Your comments and feedback are therefore appreciated [to the addresses at the end of this document].  The application is also open for your own improvements using any of the tools provided by Microsoft Access, if you have some knowledge how to use them.

Biotella is distributed under the Mozilla Public Licence 1.1.  This means that you can use it free of charge.  You can extend it, and even get paid for your developments if you find a customer.  However, you should make your developments again available for others.

There is no real support to Biotella.  You use it at your own risk. A little understanding of relational databases in general, and Microsoft Access and Visual Basic in particular is useful and rewarding when working with Biotella.  However, you can use it just "as is" and you don't have to do any programming in order to use Biotella.

3. Installation

3.1. First installation

There are only very few things that are needed before you can start using Biotella.

You must download it from the source given at the end of this document and place it on your hard disk.

You must identify you database uniquely.  This is necessary for import and export and many defaults such as leg. and coll. values used in labels and input.  To do this, you must have an entry for yourself in the person table and another entry in the collection table.  You do this by starting the application and choosing from the "References" menu "Person".  Enter there in a new line a 3-character code for yourself.  A suitable code is what has been used for you earlier by scientific societes when you have announced observations.  (If you want to share your data with others, and also avoid this step in future Biotella upgrades, you should send an email to the address at the bottom of this User's Guide, indicating what code you are using.  Indeed, som standardisation in this area is needed.) Also enter any other personal details you wish. Close the "Person" form.

Then do the same for "Collection". In this table you must have at least one line, which describes your own collection and database, and the field "Thisflag" checked for it. The field "Keeper" should contain your person code. In this record, you can also identify the path to your favourite browser, the location where you have installed Biotella.

Before entering some Catches or Specimens, you should have some data in Site and Period ("Times") tables.  In summary, in order to get started without hassle with Biotella after download, go through the following steps:

  1. From "References" menu, open "Person" form and have your data there.
  2. From "References" menu, open "Collection" form and create one record there, where you are the "Keeper" and where you have checked "Thisflag".
  3. From "Records" menu, open "Sites" form and create at least one record.
  4. From "Records" menu, open from "Times" the "Period" form and create at least one record there.
As an option, you may want to consider changing the code set that is used for names.  [See the section 5.2 below.]

3.2. Upgrade installation

If you have been using Biotella for some time and download a new version, you must transfer your data from the old application to the new.  There is currently no direct support for this.  You must start the new application and choose "Database" --> "Exit to Access".  Go to "Tables" and "File" --> "Get External Data" --> "Import" and copy all the tables from the old database to the new.  Any changes in the table structure, which you should do, are documented in the Release Notes that come with the application.

4. Structure

Structure of the Biotella database is in the 3rd normal form [Figure 1] . This means that all data is stored only once.  A few exceptions have been made in order to speed up the processing, but in principle a purist approach has been followed.  Biotella tables closely mimic real world objects.

The most important tables of Biotella form a hierarchical structure.  At the top you find "Site" and "Period".  These define a space-box and time-line.  Under these is "Observation", which describes the observation event and methods.  During each "Observation" a number of sightings, called "Catch" here, were made.  One catch is a cohort of similar insects defined by taxon, stage/gender, and generation.  Under each catch, a number of "Specimens" may or may not have been collected.  Most of the other tables have just subsidiary roles.

There are a few points where the design could be different.  "Period" could perhaps be collapsed into "Observation" alltogether.  "Team" carries an unnecessary "Period ID".  Geographical co-ordinates are kept in a table separate from "Site" in order to facilitate their independent input from other applications and GPS. Many redundant coordinate systems have been offered.

The current version lacks most notably place for food plants and rearing history.  Also a digital image archive could be created.  Suggestions how to do these are gratefully appreciated.

Throughout the system codes used by computer for linking tables are denoted with "ID", such as "Taxon ID".  You should not bother with these "IDs", although you often see them. Codes intended for human use are denoted "Code", for instance "Taxon Code". You should learn to make use of these "Codes" as they convey loads of information in short form.

5. Functions

In the following the functions of Biotella are described by the menu structure.  We begin with the second top level menu, because that is where the action is.

5.1. Records

5.1.1. Sites

The "Site" form is used to store data of places.  A site can be as small as a single trap location. Indeed, you should avoid merging (and losing) information outside Biotella: if you have two traps, create two locations -- one day you will want to compare them.  A new record can be created using the standard Microsoft Access record navigator at the bottom of the window.

"Site ID" is automatically generated number for computer use only and can not be changed. "Site Code" works closely with it.  It is a 8 byte mnemonic identifier that the user can (in fact, must) choose to quickly refer to the site in other parts of the system.  Examples of "Site Code" which you could come up with are "Saana", "Olos",  "Koli-K07", "CPH-A7", "Skogahus", "Utula1".

"Country ID" is maintained table "Country", which must be visited to add/edit that information.  Similarly, "Region ID" comes from table "Region".  Choices of "Region ID" are governed by what is entered for "Country ID" in the "Region" table.  Similarly, "Municipality ID" comes from "Municipality" table and the choices in this combo box are limited by the regions.  There is a greyed, non editable info field next to it, which is automatically updated when the cursor is placed in "Site ID", "Site Code" or "Municipality ID".

"Exact Location" is meant to store any free text about village, mountain, lake, street, or similar.  "Biotope ID" is not yet implemented.  "Description" is a free comment.  There is even a button with a picture of binoculars, which you can use to bring up Microsoft Access standard a search tool.

Compared with EU geographic classification, Country matches NUTS1 level, Region NUTS3 in most countries, Municipality NUTS4, and Exact Location NUTS5.

A separate panel groups the geographical co-ordinates that refer to the site.  You must manually enter to "Point ID" the key from "Geographical Point" table/form.  "Line ID" and "Area ID" have not yet been implemented, but it is worth making note that the former can be used to exactly describe a survey trail.   Click the "Find/Edit" button to enter the form that can be used to manage point co-ordinates.  If there is a value in the field for "Point ID", that record is located when the new form opens.

The form "Geographical Point" manages a similarly named table.  Also there a new record can be created using the standard Microsoft Access record navigator at the bottom of the window.

There are fields for four different co-ordinate systems.  "Latitude/Longitude WGS-84" refers to geographic co-ordinates of this global system in format "degrees.decimal degrees".  "Latitude/Longitude KKJ" is the one used in Finland, and also stored as "degrees.decimaldegrees".  "Y Center" and "X Center" refer in meters to the north and east co-ordinate, respectively, of any nationally used flat co-ordinate system. "UTM" is the area-based international-system.

Remember, with the exception of latitude/ longitude, spatial dimensions in Biotella are always expressed in meters.  Also be aware that the system has been tested only using a dot "." as the decimal separator.  (If you use the comma, you will get strange errors.  Consider changing this in Windows' Control Panel's Regional Settings.)

Usually the co-ordinates are obtained in one system only and the others are calculated [c.f. Ollikainen 2000].  Such calculation can be made, for instance, using the WGS84 program [Haljala 1997].

"Point Type" can only have two values "C" for centerpoint of a circle and "S" for southwest corner of a square.  The former is useful for geographic coordinates and GPS use (C above), whereas the latter is useful when locations are expressed in flat coordinate grid squares (S above). In both cases "Point Size" should be used to denote the length in meters of the edge of the square or the diameter of a circle.  For the convenience of the Finnish users who need to express locations in what is called Yhtenäiskoordinaatisto, "Y Square", "X Square", and the "Grid co-ordinates" are calculated to denote the SW-corner of a square in the given "Point Size".

"EPE" stands for "estimated positioning error" which can be used to state the accuracy of a GPS measurements.  There is a check mark whether the co-ordinates come from GPS.

The remaining fields are for information only.  In the form there is a large panel that uses the "Y Center" and "X Center" and shows how the relative position of the point fits in the given panel. There is some possibility for zooming.  This panel is still experimental and will be more useful once the nearby sites can be shown.  These are already listed in a little subform that lists the other sites that fall within the extent of the panel.  However, these are not yet placed on the panel.

A special button "View on Map Server" becomes enabled if the "Latitude-WGS84" and "Longitude-WGS84" are given.  This button opens your favourite browser and connects to the ArcData Online service of ESRI, centering on the current coordinates.  In a map that comes up, you can see where the site is actally located. [Try it to see where I write this!] You may use the map server's tools to pan and zoom.  Pretty cool and probably also quite useful! Unfortunately I am not awate how to put markers of any kind to the right place therein.

5.1.2. Times

Times refers to "Period" table.  It defines a time box.  A new record can be created using the standard Microsoft Access record navigator at the bottom of the window.

"Period ID" is computer generated and not editable.  The mnemonic "Period Code" should be used to quickly refer to this time line.  Anything goes here, but a strongly recommended syntax that is used at least when quickly entering specimen data [see 5.1.4 below] is YYMDD[MDD].  This way the user can with the least possible number of characters refer to a time box.  For instance 99315 could be interpreted to be 15 March 1999.  99315401 could be interpreted to mean a period between 15 March and 1 April 1999.  For the new millennium, a leading "a" means the first decade. For instance a1312 would be 12 March 2001.

Regardless of the above quick syntax, the real date and time are stored in the fields "Time" and "End Time".  These are long datetimes such as "2001-03-17 18:39:00". The settings of the operating system determine how these are actually shown. (Be aware that only the ISO date style, of which an example is above, has been tested througout...)

You also see a subform that allows viewing the team that participated in the observations made during this period.  The field "Rank" can be used to determine order, if desired. However, "Team" is entered and managed in a form launched from "Observation".

In the lower part of the form there is a subform (read only) that shows the observations that were actually made during that period.  This is useful information when entering data from multiple simultaneously running traps.

There is a button that opens a form for recording weather during the period.  However, that has not yet been linked with the rest of the system.

5.1.3. Catches made during and observation event

This is the main form for entering field observations.  The main form covers the "Edit" and "Search" functions against the "Observation" event table.  The subform "Catch" covers the insects that were observed during that observation event (bird watchers would call the catch records sightings).

A new record can be created using the standard Microsoft Access record navigator at the bottom of the window.

The field "Observation ID" again is for computers only.  In the "Edit" panel, the user must give the mnemonic "Site Code" and "Period Code" that were described above.  It would helpful if they would have been entered already in advance.  This way, it would be a simple job to type them or choose them from the pull-down menus.  However, if that is not the case, or the user wants to verify the data behind these codes, the buttons "View/Edit" are available to enter the "Site" and "Period" forms.  There is a handy return button available in those forms, which is only enabled when a new "Observation" is being created.  Clicking this button "Insert into New Observation" enters the current site and period identifiers in the target fields in the "Observation with Catches" form.

"Quantitative" is a box to be checked, if this observation represents a quantitative sample, such as a complete count of numbers of all insects in a light trap (a good practice, by the way). "Empty" is checked by default and denotes if nothing was caught/seen during the observation event.  It is automatically unchecked when the first catch is entered.  "Method" is a pull-down list from a similarly named table.  "Efficiency" can be used to record the number of trapping devices or baits.  "Proportion" can be used to describe such (rare) situations when not all material was counted.   "Date reported" is a timestamp and "Comment" available for any remarks.

"Reporter" is the code of the person who is responsible for communicating these catches to the rest of the world.  It may or may not be the same as the real observer.  This code is automatically copied by default from the "Collection" table.  However, the real observers are kept in the "Team" table and form.  By default, the Reporter is entered there automatically as the first observer.  You can change this or enter also other observers, if needed.  "Rank" can be used to denote the order by which the observers should stand in a publication.

Warning:  This "Edit" panel should not be used when searching data.  Any changes there are reflected in the catches.  This could lead to gross errors!  Therefore warnings will be issued if the user wants to change observation data when catches already have been entered.

"Search" has therefore been enabled in a totally different panel in the lower part of the form.  The choices in the "Period" therein are limited by the available "Sites".  Using this panel, the user can quickly retrieve any observation with the related catches.

The "Catch" subform is the main area where new field data is entered.  One line here represents a cohort of insects. By a cohort, we mean a unique group where the taxonomic unit, gender, stage, and generation are the same.

In the subform, "Catch ID" again is for computers only and shown just for information.  "Taxon Code" is an 8-character mnemonic code to any taxon.  While most users in northern countries would use it to enter species data, it can actually be used to enter also catches that have only been identified to, say, genus, or even higher taxonomic rank.   Therefore, the default is "L" which stands for "Lepidoptera".  That is, nothing more accurate is known of the taxon at hand.  (This way, one can make a trip to Panama, collect large numbers of  various moths, and still be able to enter them in one stroke in the database, for printing labels if for no other reason...)  For species "Taxon Code" has been formed from the 3+5 first characters in the names of genus and species, respectively.  It may not be always easy to remember the full code (often one just remembers the species name, while the genus name has changed lately). In such situation, one should navigate closer to the right choice by entering some closely related "Taxon Code".  Before leaving the field, the user should then use the pull-down menu to complete the data entry with the right "Taxon Code".  Indeed, the "Taxon Code" field is "smart": it suggests the first possible completion of the code without requiring the user to type in the entire code.  This is a very powerful feature.  Experience has shown that most "Taxon Codes" can be entered in with just 4-5 characters, even when using the entire European Lepidoptera catalogue [see 5.2. below].  Upon leaving the field,  "Taxon ID" is looked up from the database.  "Taxon ID" is a long integer that is the same for all synonyms of a taxon.

"Stage" actually combines gender and stage and comes from a similarly named table.  "Generation" is defined here as the number of generation during a calendar year. Hence, it is correct to enter "2" as the value for Aglais urticae sightings in August, although this species that hibernates as image has only one annual generation.  "Quantity" is the number of specimens observed. "Accuracy of Quantity" is a plus/minus range related to the above.  For instance, about one hundred Pieris napi were seen, but the user thinks that it could have well been even 200 but no less than 50.  In this case, one should enter 125 as "Quantity" and 75 as "Accuracy of Quantity".  "Certainty" is by default "True", but should be turned by the user to "False" if there is any doubt of the correct identification of the taxon.  The export and analysis routines, by the way, skip any records where this flag has been set.

There are no dual fields for male and female catches, which are often used by collectors.  These should be entered in two separate lines.  There is not special field for forma and aberrations. There should be distinct codes for these among taxa.  Indeed, it is technically possible, although not advisable, to have several lines for identical cohorts.

"Collected Quantity" should be used to show the number if specimens captured for collection. Any number greater than zero here actually generates the rows in the "Specimen" table. (Changing this number currently generates not the difference, but the full number (to be fixed).)  However, it should be noted that this variable is for information and functional automation only.  It does currently not accurately reflect the contents of the "Specimen" table. Records there can be added and deleted without making changes here.

5.1.4.  Specimens

There are two ways to view specimens: by taxon and in a plain table.  The latter contains almost no automation, so we describe here only the former.

In the top part of the "Specimens by Taxon" form the pertinent taxon data is shown from table "Taxon".  There also is a little panel for setting the default values for leg., det., and coll.  These are only used when entering new specimens.

As there are many taxa, it is almost always necessary to search for a particular one. For this currently only the built-in Microsoft Access search tools are available.  The user should place the cursor to the relevant field and hit Ctrl-F to invoke the search tool. (This is useful throughout the system.)

The "Specimens by Taxon" form allows viewing any specimen data that was generated while recording field catches. However, it would be too tedious to use the above path (site - period - observation -catch) to enter data for already existing collection specimens.   Therefore this form also allows entering collection data directly.  Indeed, one could say that the Biotella database can be approached from two directions.  Above we have described the top-down fashion where first the space-time box is defined, then the observation event, catches made and finally the specimens saved. The other way could be understood to be a bottom-up approach.  It only happens indoors while examining a collection. However, a fully normalised relational database model allows storing data in one and same structure in both cases.

In the subform, "Specimen ID" is for mainly for computer use, but it should be noted that you can use it to identify specimens in your collection.  It is therefore printed into labels [see 5.6.4].

The familiar "Site Code" and "Period Code" are available here to facilitate rapid entry of specimen data.  How that works is described at the end of this section.  Also "Stage" and "Generation" are carried over from "Catch".  (This is one of the few places in Biotella where some information is stored twice.)

"Quality" is assessed using the definitions of Helsinki Entomological Exchange. "Spread" is checked if the specimen is spread. "Collection ID" is usually same as the database id, but because the user may store information of specimens also in other collections in his/her database, it is available here.  "Box ID" can be used to identify where in collection the specimen is.  "Part of" can point to another "Specimen ID", for instance in case the specimen is a genital preparate.  If "Labelled" is empty, Biotella can print labels for such specimens [see 5.6.4].  "Label Text" can be used to type in the actual data in an older label.  However, Biotella does not store anything here automatically when it prints labels, as the information is already in the system.

When entering specimen data from collection, one is basically challenged by creating or linking to the appropriate existing records in Biotella' structure.  If the records exist, it should be easy to find them in the pull-down menus provided by the "Site Code" and "Period Code" fields in the subform.  Once they both have been located, and the user leaves the "Period Code" field, the pertinent "Observation" record is automatically located or created.  When leaving the "Specimen" record, a similar "Catch" record is created (no attempt is made to locate a possibly existing "Catch" record").  All these four tables are linked together.  If an appropriate "Site Code" does not exist, the user can double-click the field, which brings up the "Site" form.  There the data can be entered and when clicking the "Requery and Close" button, the new "Site Code" becomes available (although it is not yet shown in the list until the form is closed and restarted).  In the likely event that an appropriate "Period Code" does not exist, the user can enter the date from the specimen's label in the syntax described above [5.1.2].  This brings up the familiar "Period" form, which is pre-filled but must be checked.

The above facilitates a rapid entry of specimen data.  However, it is advisable to enter in advance the sites using the "Site" form as there is no standard syntax that could facilitate rapid generation of the "Site" records.  For "Period" this is much easier.

Warning:  Any changes in existing "Site Code" and "Period Code" in the "Specimen" data are currently unsupported.  One should understand that these are only automation objects and not the real data to be  used for analysis and exportation to other systems.  Probably these must be kept in sync in order to avoid misinterpretations, but the likely solution is such that a query just regularly overwrites these codes in "Specimen" with the codes from their original sources, and reports eventual discrepancies for a possible manual correction.

Finally, one should make note that the "Leg." data is entered in the "Reporting Person" in "Period" and the "Coll." data is entered in "Specimen".  If a "Det." field is filled, corresponding records are automatically created in the "Verification" table, but that part of the system is still little tested.

5.2. Taxonomy

Biotella currently comes with a taxonomic database of European Lepidoptera.  This is based on the CD-ROM in the publication of Karsholt and Razowski [1996], but has been heavily processed from it. For instance, unique "Taxon Codes" and "Taxon IDs" have been generated, linkages to supertaxa made, and synonyms generated.  The number of records has grown from 10607 to 14365.  The original flat file that comes on CD-ROM with the book is not suitable as such for operational use. However, in order to avoid copyright violations, only "Taxon" one of the two major tables used by Biotella is distributed in full length.  The other table "AvailableName" is only available as excerpts. No distributional data from the book is used by Biotella.  The purchase of the above book is recommended in order to be able to make full use of Biotella taxonomy functions described in 5.2.3.

Why does Biotella not use some established codes, but comes with its own set?  The sad fact is that there are no such established, widely applicable codes.  The closest is the RUBIN code set (Rutin för Biologiska Inventeringar) created by the Nordic Code Centre [NCC] and used for instance by the Nocturna application.  However, because of short-sightedness of the authorities, the NCC was laid down in 1993.  The codes are no longer centrally managed and new ones are not made available.  Moreover, RUBIN only covers North-Western Europe and nothing comparable is available in other parts of Europe.  In North America, one may want to use ITIS Taxonomic Serial Numbers. We hope that the GBIF or Species 2000 will one day issue the numbers and codes globally.

5.2.1. All taxa

This is just a data sheet view to the table "Taxon" that is used widely in the system.  It defines "Taxon Code", "Taxon ID", "Taxon Name", "Validity", "Rank", and "Supertaxon".

When changes in taxonomy occur, one may want to edit this table.  However, it is required that all changes be made after the information received from official museum sources.  IDs and codes meant for sharing should not be invented by the user as they will be refused by the central datawarehouses.  Check the download site for any new "Taxon Code" and "Taxon ID".

"Taxon Name" is in principle a triplicate: 1) Genus or higher taxon name, 2) species level name, 3) subspecies level name.  In addition, one may add names of forma as the 4th element.

The table contains codes and IDs for all available synonym and homonym names.  The "Taxon Codes" of these different names can be used for data entry.  However, the "Taxon ID" always points to the same taxon.

5.2.1.1.  Use of RUBIN codes
As said, Biotella comes with its own "Taxon Code" and "Taxon ID".  However, it is possible to use Biotella with RUBIN codes, if the user so chooses, because the RUBIN codes are included in the "Taxon" table (just for this possibility).  This is a small modification and must be made manually.  In the table "Taxon", the field "RUBIN ID short" must be renamed "Taxon ID" and "RUBIN Code short" must be renamed "Taxon Code".  Naturally the existing fields must be renamed to something else, but the original "Taxon Code" should be kept as primary key. In addition, in the "Collection" table for the record that thas "Thisflag" checked,  the field "Codeset" must be changed to "RUBIN" from "Biotella" so that the export function can properly tag the data.

This swap of codeset is normally a one-time operation when Biotella is installed for the first time.  Changing it after there is data in the database requires extensive manual conversion work.  However, this change is possible because (not incidentally) the field "Taxon Code" is 8 bytes in both systems, and the "Taxon ID", which is a signed long integer (32 bits) in Biotella, but an 12 bytes long string in RUBIN can with little effort be truncated "RUBIN ID short" which also is a signed long integer.  This was done (for Lepidoptera) so that for the original RUBIN IDs over 720000000000 but below 720999999999 (so called micro moths), the first of the above numbers is deducted from the "RUBIN ID short" and then the remainder is multiplied by -1. For the others (so called butterflies and macro moths), 719000000000 is reducted from the original.  (Computers today are built for either 32 or 64 bit operation (that is, 4 or 8 bytes). Why RUBIN chose to use 12 byte numbers as strings for keys is not easy to understand...)

5.2.2.  Rank and validity

"Validity" and "Rank" come from similarly named tables and can be viewed/edited from the menus.  The former tries to cover most cases known by the International Code for Zoological Nomenclature [1999], but is not yet complete.

"Rank" is based on the codings of ITIS.  It is useful to notice how the taxon concept used in Biotella actually allows dealing with splitting of species with the help of the "Rank" attribute.  When a taxon that was believed to be of "Rank" 60, that is, species, is discovered to be actually two species, it is changed in the "Taxon" table to "Rank" 58, that is, an aggregate species.  The new two species are then created their own "Taxon" records, which can be used from this on. The old data remains unchanged. Only its interpretation is new.

To summarise, a Biotella user makes observations of taxa, not species, nor names.  A taxon is a flexible concept that can go nearly as low as a single population.

Currently only a few subspecies are available and forma and aberrations not at all.  Contributions to these are appreciated.

5.2.3.  Other taxonomy functions

The menu also allows viewing by taxonomic category, authors, and printing of name tags for collections. These functions require the "AvailableName" table.

5.3. References

This menu gives access to the various subsidiary tables in the system that are needed to describe the contents in the main tables.  As there is no automation needed in these tables, no fancy forms are created.  Instead, simple views directly to the tables are provided.  The user can edit the contents as needed, but should pay lots of attention to the standardisation of the IDs, if data is intended for exportation.  A new record can be created using the standard Microsoft Access record navigator at the bottom of the window.

5.3.1. Collection

The fields are a unique identifier "GUID", name of the "Keeper", "Thisflag" that denotes whether the collection and this database are the same, and the "Codeset".  There are other fields where paths to the default browser, the database location, and dump location can be given.

It is possible to also describe other collections in this table, although currently the application does not make use of this.

5.3.2. Person

This table requires no description.  "Person ID" is the key used widely in the system for leg., det., and coll.  As explained in section 3.1, you must have your own record here.

"Person IDs" have not yet been standardised.  Some codes are used in Baptria reports, some others in RUBIN.  Biotella does not yet come with RUBIN person codes.

5.3.3. Team

This is an associative table between persons and observations.  ("Period ID" probably has no real business here.)

5.3.4. Certification

This is a special table that one day will be used to store information how well the persons are able to identify bugs in certain taxonomic groups.  For instance, the export function should not accept any Eupithecia records from this author unless a recognised scientific authority provides a digital certificate that this person has passed its rigorous tests.

5.3.5.  Country

We allow 3-letter ISO codes here as keys as they are better to print in labels than the 2-letter codes.  Co-ordinates are stored here in order to facilitate some simple co-ordinate tests on data.

5.3.6.  Region

"Country ID" and "Region ID" together form a primary key.  It is worth making note here that the "Language" used in these codes is kept track of. A small problem (to be solved) is that currently "Region ID" in only one language is allowed in the "Municipality".  So it is pointless to enter "Kb" instead of "PK", because there are no municipalities under it for time being.

5.3.7. Municipality

"Country ID" and "Municipality ID" form the primary key together, because the latter has been standardised in many countries.   Therefore, the same "Municipality ID" alone may point to several places (say 180, in Finland is Jyväskylä mlk, but in Sweden Stockholm). Therefore, in queries, always include the country.

5.3.8. Method

This table describes how the observations were made.  Often when speaking of trapping, it is worthwhile to separate the "Attractant" and "Catch Device".  Standardisation of codes is needed here.

5.3.9. Specimen quality

This is the Helsinki Entomological Exchange classification "Prima", "Secunda", "Defecta". The 4th category defies description.

5.3.10. Literature

To be done.

5.3.11. Image archive

To be done.

5.4. Analysis

The analysis capabilities of the microcomputer version of Biotella are quite limited at this point.  Contributions in this area are sorely needed. The menu presents a few cross tabulations only.

The first menu item allows browsing by taxon the catches and specimens.

The second menu item allows counting the numbers and quantities of taxa at each site.

The third menu item allows listing all taxa observed in a certain year and during the past N (14) days.

The fourth menu item opens a pretty useful generic search form that can be used to generate listings of sightings that fall within certain criteria.  Currently it is possible to define a time box, a space box in either geographical or flat grid co-ordinates, country, "Site Code" and a single taxon.  Unlike country, "Site Code" is not actually used in search, but is just used to set up the co-ordinates for search.  Therefore you may alter that before hitting "Search".  If you do not limit any of the given default values but just hit "Search", you will get everything in the database.  The default values of the flat grid co-ordinates are for Finland, but can be changed.

Finally, there is a hyperlink to the central data warehouse on the web, where you can send your data in dump files (5.6.1) for others to see.

5.5.  Help

The help menu presents the user with version, copyright, and licence information, some email addresses where to turn when in trouble, from where to download new versions, and a hyperlink to this document.

5.6. Database

We finally describe here the first of the main menus.  It should be pretty useful as it allows import and export in various formats.

5.6.1. Dump database

This menu item opens a form where the user can initiate a complete dumping of all the relevant tables into comma-separated text files.  The idea is to use a different computer in your network for this dump, and therefore the preset location is U:\pyyexport.  If that location is not suitable, it must be changed in all places in the macro "ExportTxtAll", or mapped to something, like C:\temp.

The purpose of this functionality is two-fold.  First, this format allows interchange of data to other systems.  Shell scripts are provided [on the download site] that allow importation of the database into MySQL and PostgreSQL.  Second, this provides a backup and restore mechanism that will later be used to automate migration of data after an upgrade of the Biotella application.

Be aware that this functionality creates a whole lot of temporary tables which increase your database size.  You may want to compact the database afterwards.

There is a menu item for restore database after this, but this is still under development.

5.6.2. Export XML

This menu item opens a form where the user can choose which data to export into an XML interchange file.  When you want to report your new observations to a central data warehouse, you should use this functionality.   The format of the export file has been agreed in a committee and is understood by central database servers.  The actual interchange format is documented in a series of DTD and XML Schema [see the download site].

You must choose at least a country for the data that you export.  The reason is that geographic co-ordinates differ from one country to another and an appropriate format must be chosen by the export macro.    You can also define a space box and time box from which to export data, if needed.

The system also adds some metainformation in the header of the XML file which describes the source of the data.  These come from the "Collection" and "Person" tables.  You can edit these if needed.

After these choices, click preview which allows you to see what is going to be exported. Then choose from Access' normal menus File --> Export --> Save as Type --> Text Files.

Be aware that the generated file is rather large, contains redundant fields, and does not contain all the data as the dump [5.6.1.] does.  The reason for this is that each "Observation" element in the XML interchange file is independent and has been designed to be screened by automatic quality control programs at the receiving central data warehouse.

5.6.3. Import XML

This menu item still has to be done.  The idea is that it should be possible to download from a central site selected observations in the same format as above.  Another important thing to be created is a return receipt.  After the above export file has been processed by the central site, it should return another XML file that contains the results.  Some observations were accepted and other, suspicious ones returned for verification.  Importing this results file should mark some of your records already received ok by the central database so that they don't have to be sent again.  Indeed, all records in Biotella are identified by so called Globally Unique Identifiers [see what is a GUID].

5.6.4. Print labels

This menu item allows printing labels that can be pinned into collection specimens.  The basic elements here are of course those of the "Specimen" table. Therefore the user can choose the minimum and maximum "Specimen ID" numbers.  It is also possible to limit the range in time.  A country must again be chosen for the appropriate co-ordinates to appear.

After these selections, the "Preview" button can be clicked.  The labels will be shown there, but as they must be done in very fine print, the user probably wants to zoom in for a closer look before printing them.

The labels actually consist of 3 elements.  The first one presents the usual "Country": "Region": "Municipality" co-ordinates, time, and leg. information.  The second element presents the taxon name and gender.  There also is the "Specimen ID" and collection name.   The idea is that this part will be on the underside of the folded over label.  The third element is meant to be discarded but is useful when cutting the labels and trying to identify into which specimens they actually should go.  It shows the "Site Code", "Quality", "Spread", "Certainty", and "Taxon Code" attributes.

Labels should only be printed using laser printers!  Ink jet is not water proof.

After accepting the printed labels, click "Mark as printed".  This way the "Labelled" attribute of these specimens is changed and they will not be printed again.  Do not press this button unless you have accepted the results and are definitively finished with labelling these specimens.  Printing these labels again will require clearing the "Labelled" field of these specimens from the "Specimen" table.

5.6.5. Return to Access

It is sometimes necessary to escape from the Biotella menus and application to the plain Microsoft Access interface.  This is necessary for instance when the database needs to be compacted after a dump.

Of course this also allows the user to modify any aspect of the system and source code of Biotella.

5.6.6. Quit

This function closes the database and quits Microsoft Access.

The work is not over, however.  You need to consider backing up your database if you have entered any data that you are not capable or willing to create again and do not want to lose. The best idea is to regularly copy the one and only database file (it has a .MDB extension) to a writable CD-ROM.  It is also advisable to copy the dump files here at least once a year, because they can be read also in the year 2099 when Microsoft Access may not be around anymore. You should store these backup copies in a different building. You should also regularly perform restore exercises, i.e., to make sure that the backup copies you create actually are readable.

6. Conclusion

This version of Biotella is the second one offered to the public.  It is certain that bugs, functional glitches, and unexpected features will appear when more users try it.  Any feedback to the email address below is appreciated.

However, as the application is open for scrutiny and modifications, any user who is not too much scared about computers should find it easy to fix any problems and improve on the functionalities.  Here, Biotella is unique. There are several entomological PC applications available, but none of them have released the source for their users.  Compared with the Lepidata application, Biotella's advantage is the fact that all data is stored only once.  This results in much faster use.  The Nocturna application of Finnish Environment Institute's moth monitoring scheme [Söderman and Tahvanainen 1994] is powerful for storing trap catches, but is not very useful for general observations or collection management.  Australian BioLink is powerful, but does not come with European taxa.

This database application has been released now, because there is a need to enable amateur entomologists to contribute their observations to emerging central data warehouses on the web.  For example, the Lepidopterological Society of Finland has such a project.  The other Scandinavian entomological societies also operate web-based reporting systems.  The role of Biotella is to complement such central web-based systems by acting as a data feed mechanism.  Future releases of Biotella will almost certainly see sophisticated peer-to-peer data swapping functions.

Even without these communication needs, Biotella should be useful as replacement of personal paper based notebooks.  More and more amateur entomologists have turned away from collecting insects to just making field records and photographing.  Such an activity, if serious, almost obligatorily requires a database.  Biotella can fulfil this function.  However, its analysis capabilities should be much improved for that purpose.  They have been let to less attention because the idea has been that the analysis happens on a central computer via a web interface.  Voluntary contributions in this area are appreciated.

7. Acknowledgements

I want to thank Jere Kahanpää and Janne Sinkkonen for discussions that have led to improvement of Biotella data model, and Jorma Kelo for useful suggestions on documentation and functions.

8. References

ArcData Online http://www.esri.com/data/online/ Environmental Systems Research Insitute, Redlands, California.
Biolink http://www.biolink.csiro.au/
Haljala, S. 1997.  WGS-84:n ja KKJ:n välinen koordinaattimuunnos.  Ohjelmaseloste.  Maanmittauslaitos, Maastotietokeskus. http://www.nls.fi/kartta/palvelut/wgs84/
Karsholt, O. & Razowski, J. 1996.  The Lepidoptera of Europe.  A distributional checklist.  380 p, CD-ROM.  Apollo Books, Stenstup.
Ollikainen, M. 2000.  The Finnish national grid coordinate system.  National Land Survey. http://www.nls.fi/kartta/julkaisu/kkj.html
Nordic Code Centre archive http://www.nrm.se/ncc/
Saarenmaa, H., Kaila, E. & Kyrki, J. 1984. Automaattisen tietojenkäsittelyn mahdollisuudet entomologisten havaintotietojen hallinnassa. Abstract: Outlines of an entomological data bank. Luonnon Tutkija 88: 154-163.
Söderman, G. & Tahvanainen, P. 1994.  Moth monitoring scheme.  Environmental Report 8, 63 p.  National Board of Waters and Environment, Helsinki. http://www.vyh.fi/tutkimus/seuranta/yoperho.htm



 
Document version 005.  Based on release Biotella2001f2.  2001-11-04 7:12.
Feedback to Hannu Saarenmaa, Östra Odarslöv 171, S-22592 Lund, hannu@bioshare.org
Download from http://www.bioshare.net/serv/download/biotella/