Distribution Upload HOWTO
From ISP_RAS
How To Collect And Upload Distribution Data
There are two ways to collect distribution data:
- Using installed distribution.
- Using set of rpm or deb packages of which the disribution consists of.
This document concerns the second one only. It is much more preferable, since it doesn't require the distribution to be installed. The first way requires all packages that should be uploaded tobe installed (usually ~1000 packages, though the number varies for different systems); it really takes a lot of time and we don't recommend to follow this way. (However, see step 6 in the algorithm below).
The Data Collection Algorithm
The following steps should be performed to collect data using deb or rpm packages:
- Call AnalyzePkg.pl script to detect in which packages we are interested in. Provide the script with a path where distribution packages are located using '-p' or '--path' option:
./AnalyzePkg.pl -p /path/to/distribution/packages
In case of deb packages, call the script with '-d' option:
./AnalyzePkg.pl -p /path/to/distribution/deb/packages -d
The script will create directory PKG with files containing analysis results.
Note: One can use mounted media (CD or DVD) as a source of packages, but note that the data collection process will be faster if all packages are located on a hard drive (simply becase file access/copy speed is faster), so it is more preferable to use mounted iso image.
- Call distrtodb.pl to actually collect the data. In case of rpm packages, the script should be called with '-r' option; for deb packages, '-d' should be used. You should also specify distribution architecture using '-a' option:
./distrtodb.pl -r -a x86 DistrName DistrVersion PKG/component_list_sorted
The script takes the three arguments - distribution name, version and list of packages to be uploaded (the latter is created for you automatically by the AnalyzePkg.pl, but you may use your own if needed).
- If everything is ok, a file with distribution data should be created. The file is named '<DistrName>_<DistrVersion>_<Arch>_upload_data' by default.
Before proceeding to the next step, rename this file or save it to some other location:
mv <DistrName>_<DistrVersion>_<Arch>_upload_data <DistrName>_<DistrVersion>_<Arch>_upload_data_main
- Now let's collect data about interpreted languages. Due to algorithms used in the upload tools, we cannot place this data in file created as the result of step 2. That's why we have renamed that file in the step 3 - otherwise it will be overwritten in the next steps.
And now, let's simply get all packages containing 'py' or 'perl' in their names:
find /path/to/distribution/packages -iname "*py*" -exec cp \{\} . \;
find /path/to/distribution/packages -iname "*perl*" -exec cp \{\} . \;
And create a list of them:
ls *rpm >component_list
We don't AnalyzePkg.pl analogue for interpreted languages yet. However, the approach with 'find by name' found to be rather precise; surely, there will be some garbage, but not many. We can only recommend to explore created 'component_list' manually and to remove those entries that are useless from analysis point of view (for example, packages containing '*py*' in their names, but which don't really provide any python modules).
- Now let's call distrtodb.pl once more:
./distrtodb.pl -r -a x86 DistrName DistrVersion component_list
Again, '<DistrName>_<DistrVersion>_<Arch>_upload_data' will be created. We recommend to rename it to more sensible name:
mv <DistrName>_<DistrVersion>_<Arch>_upload_data <DistrName>_<DistrVersion>_<Arch>_upload_data_perl_python
Note: Do NOT merge this file with the file created in the step 2.
- (OPTIONAL). There is some built-in python modules whose presence can be checked only in the installed system (with installed python, of course). The absence of information on this modules in the database is not considered as a critical issue at the moment, so this step is optional
To obtain this data, one should execute distrtodb in the installed system with '--dump_python' option:
touch dummy_list
./distrtodb.pl --dump_python -r -a x86 DistrName DistrVersion dummy_list
rm dummy_list
- Now we have three (or two, if step 6 was skipped) '*upload_data' files. This is the final result of data collection process - now these files should be processed by upload tools.
How does this work?
Now some words about algorithms used by data collection tools.
AnalyzePkg.pl script detects which packages provide data we are interested in. We don't collect data about all distribution components; instead, we a list of so libraries sonames stored in the 'library_list' file and for every distribution we collect data about components that provide libraries with such sonames. The list is created on the basis of application needs - if we see that applications use some library, we should add such library in this list.
NOTE: The list of libraries implements 'approved libraries' concept; however, ApprovedLibrary table in the database and the 'library_list' file have different meanings. 'library_list' should contain all sonames from the ApprovedLibrary table, but not vice versa. For entries from the ApprovedLibrary we guarantee data completeness - i.e. if some soname is present in this table, users can be sure that we have looked for libraries with such soname in every uploaded distribution. So if one adds a new entry in the 'library_list' file, he should collect data about this new library for all distributions which are already present in the database. Only after that the entry can be added to the ApprovedLibrary table.
We also have the same list of commands stored in the 'command_list' file. This is simply a list if all commands included in the LSB (to say he truth. there are some more relative commands in this list).
We don't have any analogues of 'approved' lists for perl or python; the thing is that the approach with 'find' is rather useful - almost all files found there are really interesting for us. Nevertheless, we are planning to develop AnalyzePkg.pl analogue for perl/python in future.
Detailed description of distrtodb.pl can be found in the README_collect file or on the Collect Distribution Information page.