Copyright © 2020 Ashok P. Nadkarni. All rights reserved.
1. Introduction
The tarray extension implements typed arrays and associated commands
column and table
.
This page provides reference documentation for
commands related to typed tables. See Introduction for an overview and
Programmer’s guide for a programming guide.
1.1. Installation and loading
Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.
To install the extension, extract the files from the distribution to any
directory that is included in your Tcl installation’s auto_path
variable.
Once installed, the extension can be loaded with the standard Tcl package require command.
% package require tarray
→ 1.0.0
% namespace import tarray::table
1.2. Tables
A typed table is an ordered sequence of typed columns of equal size. It can be viewed as an array of records where the record fields happen to use column-wise storage. The corresponding table command operates on typed tables.
The columns in a table are defined with a name, type and order when the table is created. Commands that operate on tables allow columns to be specified using either the column name or its position.
1.3. Types
All elements in a column must be of the type specified when the column is created. The following element types are available:
Keyword | Type |
---|---|
|
Any Tcl value |
|
A string value |
|
A boolean value |
|
Unsigned 8-bit integer |
|
Floating point value |
|
Signed 32-bit integer |
|
Unsigned 32-bit integer |
|
Signed 64-bit integer |
The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.
The types any
and string
are similar in that they can hold any Tcl
value. Both are treated as string values for purposes of comparisons
and operators. The difference is that the former stores the value
using the Tcl internal representation while the latter stores it as a
string. The advantage of the former is that internal structure, like a
dictionary, is preserved. The advantage of the latter is significantly
more compact representation, particularly for smaller strings.
Attempts to store values in a column that are not valid for that column will result in an error being generated.
1.4. Indices
An index into a typed column or table can be specified as either an integer or
the keyword end
. As in Tcl’s list commands, end
specifies the index of the
last element in the tarray or the index after it, depending on the command.
Simple arithmetic adding of offsets to end
is supported, for example
end-2
or end+5
.
Many commands also allow multiple indices to be specified. These may take one of two forms — a range which includes all indices between a lower and an upper bound (inclusive), and an index list which may be one of the following:
-
a Tcl list of integers
-
a column of any type other than
boolean
. The value of each element of the column is converted to an integer that is treated as an index. -
a column of type
boolean
. Here the index of each bit in the boolean column that is set to1
is treated as an index.
Note that keyword end
can be used to specify a single index or as a range
bound, but cannot be used in an index list.
When indices are specified that cause a column or table to be extended, they must include all indices beyond the current column or table size in any order but without any gaps. For example,
% set I [column series 5]
→ tarray_column int {0 1 2 3 4}
% column place $I {106 105 107 104} {6 5 7 4}
→ tarray_column int {0 1 2 3 104 105 106 107}
% column place $I {106 107} {6 7}
Ø tarray index 6 out of bounds.
Ok: Indices not in order but no gaps | |
Error: no value specified for non-existing index 5 |
2. Command reference
All commands are located in the tarray
namespace.
2.1. Standard Options
Many commands take one or more of the standard options shown
in Standard options below. The -list
, -dict
and -table
options control the format of the returned values. The
-columns
option allows selection and ordering of specific columns
from the table.
2.2. Commands
table column TABLE COLSPEC ?NEWCOL?
If argument NEWCOL is not present, the command returns the table column specified by COLSPEC which may be either the column name or its position. If NEWCOL is specified, it must be a column of the same type and length as the table column specified by COLSPEC. The command then returns TABLE with that table column replaced by NEWCOL.
table columns TABLE ?COLSPECS?
If argument COLSPECS is not present, the command returns a list containing all the columns in the specified table. If COLSPECS is specified, it must be a list of column names or positions. In this case the returned list only contains the corresponding columns.
table create DEFINITION ROWVALUES
Returns a table containing a sequence of columns. DEFINITION is a list of alternating column names and column types. A column name is an identifier for a column that can be used in lieu of a column index. The type for a column must be one of the valid types described in Types.
ROWVALUES is the initial content of the table array specified as a nested list with each sublist being a row whose element types are compatible with the corresponding column types in DEFINITION.
table create2 COLNAMES COLUMNS
Returns a table whose column names are specified by COLNAMES and
contents are given by COLUMNS
which must be a list of tarray
columns.
table csvexport ?options? OUTPUT TABLE
Writes out the contents of TABLE in CSV format to the Tcl channel
or file specified by OUTPUT. In case of the latter,
if the file already exists, an error is
raised unless either -force
or -append
options are specified.
The -force
option causes existing files to be
overwritten. The -append
option specifies the CSV data
should be appended to
the end of the existing file content. Neither option has
any effect if OUTPUT
is a channel.
The -header
option may be used to write out a header row to the
file. The option value should generally be a list of the same
length as the number of columns in the table although that is not
mandated.
The command accepts the options -encoding
and -translation
with
the same semantics as for the Tcl fconfigure
command.
Any additional options are passed on to the tclcsv::csv_write
command and control the CSV dialect to be used. These allow control
of the CSV dialect (separators, terminators, quoting etc.) of
the generated output. Refer to the
documentation
for that command for available options.
table csvimport ?options? INPUT
Returns a table containing the data formatted as CSV from the
source specified by INPUT
which may be a Tcl channel
or file. The data is read using
the tclcsv
package. If the CSV file includes a header,
it is used to form the column names for the table with characters
that are illegal in column names replaced by underscores. If the
file does not have a header, column names of the form COL_N
are
generated.
The command accepts the options -encoding
and -translation
with
the same semantics as for the Tcl
fconfigure
command.
If the -sniff
switch is specified, the tclcsv::sniff
command
is used to guess the format of the CSV file.
Any additional options are passed on to the tclcsv::reader
command. These allow specification
of the CSV dialect (separators, terminators, quoting etc.) of
input data. Refer to the
documentation
for that command for available options.
Any options specified thus will override the values discovered
via the -sniff
option.
table definition TABLE
Returns the definition of the specified table in a form that can be passed to table create.
table dbimport resultset RESULTSET TABLEVAR
Appends the contents of a TDBC result set object RESULTSET to the tarray table stored in the variable TABLEVAR in the caller’s context. The result set column types must be compatible with the corresponding columns of the tarray table. In case of errors, the original table is unmodified.
table dbimport table DBCONN DBTABLE ?COLNAMES?
Returns a table containing the contents of the database table named DBTABLE from the TDBC connection object DBCONN. COLNAMES should be a list of columns from which data is to be returned. If unspecified, all columns are returned.
The names of the columns in the returned tarray
table are as returned by
the database query result set. However, when the table is empty, the query
result set does not specify column names. In that case, the column names are
as specified by the caller or if unspecified, those returned by the TDBC
connection object (this may differ from the actual names in character case).
The database column types are mapped according to the following table.
|
|
|
|
|
|
|
|
|
|
Anything else |
|
Note in particular that the precise numeric types decimal
and numeric
are
mapped to imprecise floats. If this is not desirable, for example mapping to type
any
may be preferable, use the table dbimport resultset RESULTSET TABLEVAR command instead.
The same applies if the above mapping is not suitable for any other reason as well.
table delete TABLE LOW HIGH
Returns a typed table with all specified rows deleted. The row indices are specified in any of the forms described in Indices.
table equal TABA TABB
Returns 1 if the specified tables have the same number of columns and the column equal command returns true for every corresponding pair of columns in the two tables. Note that the column types need not be the same. See the description of that command for details.
The command will raise an error if either argument is not a table.
Also see the related command table identical which applies a stricter definition of equality.
table fill ?-columns COLUMNS? TABLE ROWVALUE LOW HIGH
Returns a typed table with specified rows set to ROWVALUE. Each element
from the list ROWVALUE is assigned to the corresponding column of the
table at the specified indices. Any additional elements in ROWVALUE are
ignored. An error is generated if any element of ROWVALUE does not match
the type of the corresponding column or if the ROWVALUE width differs
from the table width. Indices are specified in any of the forms
described in Indices. The size of the array will
be extended if necessary. The index end
refers to the last
element of the table so to append rows the index must be specified
as end+1
.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
table get ?OPTIONS? TABLE INDEXLIST
Returns the values from a table at the indices specified as a index list. Any of the standard options may be specified with this command.
table identical TABA TABB
Returns 1 if the specified tables have the same column names and the column identical command returns true for every corresponding pair of columns in the two tables. Note that the column types have to be the same. See the description of that command for details.
The command will raise an error if either argument is not a table.
Also see the related command table equal which applies a looser definition of equality.
table inject ?-columns COLUMNS? TABLE ROWVALUES FIRST
Inserts ROWVALUES, a list of rows or a compatible table
as TABLE, at the position FIRST and returns the resulting table.
If FIRST is end
, the values are
appended to the column. In all cases, the command may extend the table
if necessary.
The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.
table insert ?-columns COLUMNS? TABLE ROWVALUE FIRST ?COUNT?
Inserts COUNT (default 1) rows with value ROWVALUE at position FIRST and returns the new table. The rows are inserted at the specified position. In all cases, the command may extend the array if necessary.
The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.
table join ?options? TABLE0 TABLE1
Returns a new table containing a subset of rows from the cross product of TABLE0 and TABLE1 that satisfy a condition that the value of a specified column in TABLE0 matches that of a specified column in TABLE1.
The -on
option controls the columns of the two tables that are
matched. The option value must be a list of one or two elements.
If the list has a single element, it must be
a column name that is present in both tables.
If two elements are present, they must be the name of a column in TABLE0
and a column in TABLE1 respectively.
If the -on
option
is not specified, the value defaults to column name that is common
to both tables. If there are multiple such column names, the one
with the lowest index position in TABLE0 is used.
The columns being compared must be of the same type which must not
be boolean
.
If the -nocase
option is specified, the column elements are
compared in case-insensitive fashion. Otherwise, the comparison is
case-sensitive. The option is ignored for numeric columns.
By default, the returned table will include all columns from both
tables. If this is not required, the -t0cols
and -t1cols
options may be used to specify the columns to include. The option
values are a list of column names from TABLE0
and TABLE1
respectively.
In case the two tables have column names in common, the returned
table will add the suffix t1
to the corresponding
columns from _TABLE1 respectively. The caller can
choose a different prefix to be used by specifying the -t1suffix
.
table place ?-columns COLUMNS? TABLE ROWVALUES INDICES
Returns a table with the specified values at the corresponding indices. ROWVALUES may be a list of row values or a compatible table. The number of rows in ROWVALUES must not be less than the number of indices specified in INDICES and the width of each row must be the same as the width of the table. INDICES must be a index list in the one of the forms described in Indices and may extend the column if the condition listed there are satisfied.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
table put ?-columns COLUMNS? TABLE ROWVALUES ?FIRST?
Returns a table with the elements starting at index FIRST replaced by the corresponding elements of ROWVALUES. ROWVALUES may be a list of values or a table of the same type. The command may extend the array if necessary. If FIRST is not specified the elements are appended.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
table range ?OPTIONS? TABLE LOW ?HIGH?
Returns all values from a table in the specified index range LOW to HIGH. Any of the standard options may be specified with this command.
table slice TABLE COLUMNLIST
Returns a table containing only the specified columns from TABLE. The columns are specified by their positions or names as a list. A column must not be included more than once. The returned table contains columns in the same order as COLUMNLIST.
table sort ?options? TABLE COLSPEC
Sorts the specified table based on the values of the column specified
by COLSPEC. The options -increasing
, -decreasing
and -nocase
control the sort order as described for the
column sort command.
If the -indices
option is specified, the command returns the
a integer column containing the indices of the table corresponding
to the sorted elements.
If -indices
is not specified, the return value of the command is
the sorted table. The format and content of the returned table is
controlled by the -columns
, -table
, -dict
and -list
options as described in Standard options.
table summarize ?options? TABLE
The command computes an aggregation function for categorized data
in TABLE
which must be of the form returned by the
column categorize
or
column histogram
commands with the
-values
option. TABLE
must contain at least two columns, one
of which, the category label column, only serves as part of the
table returned by the command. The other column, the data column
on which aggregation is done, must be a column of type any
, all
elements of which are themselves columns, all of the same type and
contain values belonging to that category. By default,
first table column is assumed to be the label column and the
second is assumed to be the data column. The -labelcolumn
and
-datacolumn
options may be used to specify different label and
data columns.
The return value is a table with two columns, the first being the
label column, unchanged. The second column, the summary column,
named Summary
by default, is the result of invoking an
aggregation function on each nested column of values as described
below. This column may be renamed through the -cname
option to
the command.
The aggregation function is specified by the following options:
-
By default, or if the
-count
option is specified, the aggregation function result is simply the number of elements of the corresponding nested column within the data column. The summary column is then a column of type int. -
If the
-sum
option is specified, the aggregate function is the sum of the elements of the corresponding nested column (which must be of a numeric type). The type of the summary column isdouble
if the nested columns were of that type orwide
for integer types. -
Finally, if the
-summarizer CMDPREFIX
option is specified, the summary column values are comprised of the values returned by the command prefixCMDPREFIX
which is called with two additional arguments, the index intoTABLE
and the corresponding nested column at that index. The returned column summary column is then of typeany
by default. The-summarytype TYPE
option may be specified to change this to a different type.
See Summarizing categorized data for an example.
table vcolumn TABLEVAR COLSPEC ?NEWCOL?
Returns or sets a specified column in the table contained in the variable TABLEVAR. If argument NEWCOL is not present, the command returns the table column specified by COLSPEC which may be either the column name or its position. TABLEVAR is not modified.
If NEWCOL is specified, it must be a column of the same type and length as the table column specified by COLSPEC. The command then replaces that column in TABLEVAR with NEWCOL and returns the variable’s new value.
table vdelete TABLEVAR LOW HIGH
Deletes rows from the table in variable TABLEVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command. Indices are specified in any of the forms described in Indices.
table vfill ?-columns COLUMNS? TABLEVAR ROWVALUE LOW HIGH
Set the elements of the table in variable TABLEVAR at the specified indices to ROWVALUE. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
See the table fill command for more information.
table vinject TABLEVAR ROWVALUES FIRST
Inserts ROWVALUES, a list of rows or a compatible table as
the table in variable TABLEVAR, at the position FIRST
and stores the result back in TABLEVAR.
If FIRST is end
, the values are
appended to the column. In all cases, the command may extend the array
if necessary.
The resulting value of the variable
(which may differ because of traces) is returned as the result of the
command.
The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.
table vinsert ?-columns COLUMNS? TABLEVAR ROWVALUE FIRST ?COUNT?
Inserts COUNT rows (default 1) with value ROWVALUE at position FIRST in the table stored in variable TABLEVAR. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.
The standard option -columns may be specified to match the order of columns to the supplied data. Note that COLUMNS must include all columns in the table as the command would not know what values to use for the unspecified columns.
table vplace ?-columns COLUMNS? TABLEVAR ROWVALUES INDICES
Modifies a table stored in the variable TABLEVAR with the specified values at the corresponding indices. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
See the command table place for other details.
table vput ?-columns COLUMNS? TABLEVAR ROWVALUES FIRST
Modifies a table stored in variable TABLEVAR in caller’s context. The rows of the table starting at index FIRST are replaced by the corresponding elements of ROWVALUES. If FIRST is not specified the elements are appended to the array. The new value is assigned back to the variable. The resulting value of the variable (which may differ because of traces) is returned as the result of the command.
The standard option -columns may be specified to target specific columns of the table or to match the order of columns to the supplied data.
See the command table put for other details.
table vreverse TABLEVAR
Reverses the order of elements in the table in variable TABLEVAR, stores it back in the variable. The result of the command is the resulting value stored in the variable.
table width TABLE
Returns the number of columns in the table.