Copyright © 2016 Ashok P. Nadkarni. All rights reserved.
This document is a programmer’s guide for installing and using the tarray
extension from Tcl. It does not list or detail every command implemented
by the extension. See the command reference pages accessible from the
Main Table of Contents for that information.
![]() |
Typed array can also be manipulated using Xtal, a language embeddable in Tcl that is geared towards typed arrays and vector operations. However, Xtal is for the most part not described in this guide. See The Xtal Language for details on its use. |
1. Introduction
The extension implements two data types - columns and tables. The general term typed array is used to refer to either of these. A typed column is an array containing elements of a single type that is specified when the column is created. The command tarray::column can be used to create and manipulate typed columns.
A typed table is an ordered sequence of named columns of equal size. It can be also be viewed as an array of records where the record fields happen to use column-wise storage. The corresponding tarray::table command operates on typed tables. Columns in a table can be referenced using either their name or their position in the ordered sequence.
The extension places its commands in the tarray
namespace.
The primary commands implemented by the extension
are column and
table, each being
an ensemble of subcommands that operate on columns and
table respectively.
Other commands provide functionality like iteration and formatting that are independent of the data type.
2. Installation and loading
Binary packages for some platforms are available from the Sourceforge download area. See the build instructions for other platforms.
To install the extension, extract the files from the distribution to any
directory that is included in your Tcl installation’s auto_path
variable.
Once installed, the extension can be loaded with the standard Tcl package require command.
The examples in this guide assume the commands have been imported into the calling namespace, as shown below, or are in its namespace path.
% package require tarray
→ 0.8
% namespace import tarray::column tarray::table tarray::print
If in addition you want to use the Xtal language, you need to load its package as well.
% package require xtal
→ 0.8
% namespace import xtal::xtal
3. Types
All elements in a typed column must be of the type specified when the column is created. The following element types are available:
Keyword | Type |
---|---|
|
Any Tcl value |
|
A string value |
|
A boolean value |
|
Unsigned 8-bit integer |
|
Floating point value |
|
Signed 32-bit integer |
|
Unsigned 32-bit integer |
|
Signed 64-bit integer |
The primary purpose of the type is to specify what values can be stored in that column. This impacts the compactness of the internal storage (really the primary purpose of the extension) as well certain operations (like sort or search) invoked on the column.
The types any
and string
are similar in that they can hold any Tcl
value. Both are treated as string values for purposes of comparisons
and operators. The difference is that the former stores the value
using the Tcl internal representation while the latter stores it as a
string. The advantage of the former is that internal structure, like a
dictionary, is preserved. The advantage of the latter is significantly
more compact representation, particularly for smaller strings.
Attempts to store values in a column that are not valid for that column will result in an error being generated.
4. Indices
An index into a typed column or table can be specified as either an integer or
the keyword end
. As in Tcl’s list commands, end
specifies the index of the
last element in the tarray or the index after it, depending on the command.
Simple arithmetic adding of offsets to end
is supported, for example
end-2
or end+5
.
Many commands also allow multiple indices to be specified. These may take one of
two forms - a range which includes all indices between a lower and an upper
bound (inclusive), and an index list which may be a list of integers, or a
column of type int
. This latter allows the indices returned
by commands such as column search to be efficiently passed to other commands.
When indices are specified as a list cause an array to be extended, the index
list must include all indices beyond the current array size in any order but
without any gaps. For example, if an array contains a thousand elements (the
highest index thereby being 999), the index list 1001 1000 1002
is legal but
1001 1002
is not.
Note that keyword end
can be used to specify a single index or as a range
bound, but cannot be used in an index list.
5. Creating columns and tables
The create
subcommand creates columns and tables.
% column create int
→ tarray_column int {}
will create a typed column that can hold element of the int type. Note that the command returns a value that would normally be assigned to a variable.
The column can be initialized at creation time.
% column create int {0 1 2 3}
→ tarray_column int {0 1 2 3}
creates a column and initializes the first four elements.
![]() |
Applications should not depend on the string representation of a column
or table as that is liable to change. Use only the tarray package
commands to create and manipulated typed arrays.
|
The array will be grown as needed but as an optimization, preallocation may be requested.
% column create int {0 1 2 3} 1000
→ tarray_column int {0 1 2 3}
will request a preallocation of a thousand elements with the first four being initialized.
Alternatively, columns containing equally spaced values can be created
with the series
command.
% column series 10
→ tarray_column int {0 1 2 3 4 5 6 7 8 9}
% column series 5 -5 -2
→ tarray_column int {5 3 1 -1 -3}
% column series 10.0
→ tarray_column double {0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0}
![]() |
0 (default) to 10 with step 1 (default) |
![]() |
Decreasing from 5 to -5 with step -2 |
![]() |
Series of doubles instead of integer |
Tables can be created and initialized in analogous fashion, for example, to create a initialized table
% set tab [table create {
country string population wide
} {
{China 1350000000}
{Vatican 850}
}]
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
6. Specifying indices
Most commands require specification of the array locations to be targeted. This specification can be
-
a single index,
-
a contiguous range of indices, or
-
a list of indexes in arbitrary order specified as a list of integers or a column of type integer.
The various possibilities are illustrated below.
% set col [column create double {}]
→ tarray_column double {}
% set col [column fill $col 1.0 0 9]
→ tarray_column double {1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0}
% set col [column fill $col 2.0 3]
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0}
% set col [column fill $col 2.0 end-2 end]
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 2.0}
% set col [column fill $col 3.0 {2 7}]
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 2.0 2.0}
% set col [column fill $col 3.0 [column create int {2 7}]]
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 2.0 2.0}
![]() |
Creates a new column |
![]() |
Indices specified as range 0 to 9 |
![]() |
Single index 3 |
![]() |
Range relative to end |
![]() |
Indices specified as a list |
![]() |
Indices specified as an int column |
The last form, an integer column, is useful because some commands
return indices in that form. For example, the following will
replace all elements greater than 2.0
with 0.0
.
% set col [column fill $col 0.0 [column search -all -gt $col 2.0]]
→ tarray_column double {1.0 1.0 0.0 2.0 1.0 1.0 1.0 0.0 2.0 2.0}
Although the above example used columns, table indices are specified in identical fashion.
7. Values and variables
Commands that modify typed arrays come in two flavors:
-
Commands that operate on column and table values and return the modified column or table as a result (for example
fill
), and -
Commands that modify a Tcl variable containing the column or table (for example
vfill
).
The difference is similar to how different Tcl list commands behave, e.g. linsert and lreplace versus lset and lappend.
The examples above used the value-oriented form of the commands
where the fill
modifies a copy of the contents of the and returns
the modified copy which is then stored back into . For large typed
arrays, this is inefficient and the above would be better written
as
% set col [column create double {}]
→ tarray_column double {}
% column vfill col 1.0 0 9
→ tarray_column double {1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0}
% column vfill col 2.0 3
→ tarray_column double {1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0}
% column vfill col 3.0 {2 7}
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 1.0 1.0}
% column vfill col 3.0 [column create int {2 7}]
→ tarray_column double {1.0 1.0 3.0 2.0 1.0 1.0 1.0 3.0 1.0 1.0}
Here the vfill
command is directly modifying the variable and assuming the
content is not shared, no copy needs to be made.
Almost every command that modifies a typed array has this dual equivalent.
8. Storing data
Modifying a typed array may involve either storing a single value at multiple target locations or a different value at each target location. Further, the locations may be a contiguous range or a noncontiguous set of indices.
-
The
fill
andvfill
commands store a single value at one or more locations, either contiguous or noncontiguous. -
The
place
andvplace
commands store each value from a sequence of values at one or more non-contiguous locations in a specified order (not necessarily sequential) -
The
put
andvput
commands store each value from a sequence of values in contiguous locations starting at a specified index.
The sequence of values to be stored may be specified as a Tcl list
or a typed array. When multiple noncontiguous target locations are
specified, they may be specified as a Tcl list of integers or an
int
column.
% column vplace col {300.0 200.0 500.0} {3 2 5}
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 1.0 3.0 1.0 1.0}
% column vput col {7.0 8.0 9.0} 6
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0}
% column vput col {11.0 12.0}
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0}
![]() |
Stores specified values at indices 3, 2 and 5 |
![]() |
Stores specified values at indices 6, 7, 8 |
![]() |
Appends specified values |
Instead of specifying values as a list, they may also be specified as a column of the same type.
% set colA [column create double {1.0 2.0 4.0}]
→ tarray_column double {1.0 2.0 4.0}
% set colB [column create double {}]
→ tarray_column double {}
% column vplace colB $colA {1 0 2}
→ tarray_column double {2.0 1.0 4.0}
Again, this form is particularly useful when storing columns returned from commands into another column.
The table command has equivalent commands. For example
% set populations [table create {country string population wide} {}]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...
% table vput populations {{China 1350000000} {Vatican 850}}
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
Note that just like in the case of columns, the list of values can be specified as a table instead, provided the column types are the same.
When storing data, the -columns
option comes in handy for two
different purposes. First, it allows data to be specified in a
different order than that specified in the column definition. For
example, in the above example, if the order of the supplied data
was population followed by country, the command could have been
written as follows:
% table vput -columns {population country} populations {{1350000000 China} {850 \
Vatican}}
→ tarray_table {country population} {{tarray_column string {China Vatican China...
There is no need to reorder the fields in the input data.
Secondly, the -columns
allows modification of a subset of the columns.
For example,
% set populations [table create {country string population wide} {}]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...
% table vput populations {{China 1350000000} {Vatican 850}}
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
% table vfill -columns {population} populations {851} 1
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
The population of the second table row is changed to 851.
A point to be noted about all the above commands is that they may extend the size of the array if necessary. However, two conditions must apply for this:
-
where an index list or index column is specified, there must not be any gaps in indices that extend the array.
-
Second, if the
-columns
option is specified, it must include all columns (in any order) of the table. Otherwise, the command will not know what value to use for the other columns when extending the table.
As an illustration,
% set populations [table create {country string population wide} {{Vatican 850} \
{China 1350000000}}]
→ tarray_table {country population} {{tarray_column string {Vatican China}} {ta...
% table put populations {{Vatican 860} {India 1250000000} {USA 314000000}} {0 3 2} \
table put populations {{Vatican 860} {India 1250000000} {USA 314000000}} {1 3 4}
→ wrong # args: should be "table put ?-columns COLUMNMAP? TABLE VALUES ?POSITION?"
The first put
will succeed, changing the existing value at index 1 and
extending the array by two rows (note order of indices does not matter).
The second put
will raise an error since index 2
neither exists nor is
supplied in the command.
All the commands dicussed to this point overwrite existing values, at the target locations. The column insert and column vinsert commands and their table equivalents, table insert and table vinsert store a single repeated value or row, into the type array, pushing existing elements further up. Similarly, column inject and column vinject commands and their table equivalents, table inject and table vinject, insert multiple values or rows (passed as a list or a typed array).
% column insert $col 3.0 2 10
→ tarray_column double {1.0 1.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 200.0 3...
% column inject $col {1.0 2.0 3.0} 2
→ tarray_column double {1.0 1.0 1.0 2.0 3.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1...
% column inject $col $col 2
→ tarray_column double {1.0 1.0 1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 1...
The first command returns a new column with the same value, 3.0
,
inserted 10
times at index 2
. The second command returns a new
column with all values in the passed list, 1.0
, 2.0
, 3.0
,
inserted at index 2
. The last command returns a new column where
all existing values in the column are reinserted at index 2
.
9. Deleting data
Elements in a typed array can be deleted with the delete
and
vdelete
commands. Succeeding elements are moved up to occupy the deleted
slots. Like the fill
command, the indices of the elements to be
deleted may be specified as a single index, a range, a list of
indices or a index column.
% column vdelete col [column search -all -lt $col 0]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0}
will delete all negative elements from the column.
10. Retrieving data
Retrieving data from a typed array involves specifying which elements to retrieve and what format to retrieve them in when multiple elements are retrieved.
As usual, the elements to be retrieved can be specified as a single index, an index range, a list of indices or an index column. In the simplest cast, the index command can be used to retrieve a single element.
% column index $col 4
→ 1.0
% table index $tab end
→ Vatican 850
Multiple elements can be retrieved with the get and range
commands. The get command can be passed a sequence of
noncontiguous indices specified as a Tcl list or a int
column:
% column get $col {10 7 4}
→ tarray_column double {11.0 8.0 1.0}
% table get $tab [column search -all -lt [table column $tab 0] 0]
→ tarray_table {country population} {{tarray_column string {}} {tarray_column w...
The range command retrieves elements in a specified index range.
% column range $col 3 5
→ tarray_column double {300.0 1.0 500.0}
% table range $tab 0 10
→ tarray_table {country population} {{tarray_column string {China Vatican}} {ta...
By default, both commands returns values as a typed array. The
-list
and -dict
commands can be used to return the values as a Tcl
list or dictionary instead. In the latter case, the dictionary
keys are the indices being retrieved.
% column get -list $col {3 5}
→ 300.0 500.0
% table get -dict $tab [column search -all -lt [table column $tab 0] 0]
In the case of tables, both commands also provide for retrieval of a subset of columns and in a different order than in the definition.
% table range -columns {population country} $populations 0 end
→ tarray_table {population country} {{tarray_column wide {850 1350000000}} {tar...
% table range -columns {1} $populations 0 end
→ tarray_table {population} {{tarray_column wide {850 1350000000}}}
Note columns may be specified either by position or name.
Tables provide additional commands for retrieving entire columns.
-
table column returns a column from a table. This is useful for sorting and searching columns as shown in table search examples below.
-
table slice returns a new table containing a subset of the columns of a table.
11. Searching and filtering
The column search command works similarly to Tcl’s lsearch
. It
returns the indices (by default) or the values (with the -inline
option) of matching elements in a column. Like lsearch
, column
search
stops on the first match and returns the matching index or
value but the -all
option can be used to return all matches. The
command supports several matching operators. See the
column search
command reference for a full list.
% column search $col 0
→ -1
returns the index of the first element that is 0 using the default matching operator that tests for equality (assumes is a numeric column).
% column search -inline -gt $col 0
→ 1.0
returns the value of the first positive element.
% column search -all -gt $col 0
→ tarray_column int {0 1 2 3 4 5 6 7 8 9 10 11}
returns the indices of all positive elements. The return value is an int
column.
% set exes [column create string {tclsh.exe tclsh.man wish.exe}]
→ tarray_column string {tclsh.exe tclsh.man wish.exe}
% column search -all -inline -nocase -pat $exes *.exe
→ tarray_column string {tclsh.exe wish.exe}
returns the values of all elements that match *.exe using case-insensitive
matching as in Tcl’s string match -nocase
.
The search can be restricted to only look at specific elements using a
combination of -range
and -among
options.
% column search -range {0 9} $col 0
→ -1
limits the search to the first ten elements.
% column search -among {1 5 3} $col 0
→ -1
only examines the elements at positions 1
, 5
, and 3
in that
order. The option -among
is particularly useful in combining
searches as in the table example below.
To search tables, use the search on individual columns. For example,
% set countries [table create {country string population wide area double} {
{Vatican 850 0.44}
{China 1350000000 9.55e6}
{USA 314000000 9.63e6}
{India 1250000000 3.3e6}
{Russia 141930000 17e6} }]
→ tarray_table {country population area} {{tarray_column string {Vatican China ...
% set pop_col [table column $countries 1]
→ tarray_column wide {850 1350000000 314000000 1250000000 141930000}
% set area_col [table column $countries area]
→ tarray_column double {0.44 9550000.0 9630000.0 3300000.0 17000000.0}
% table get -list -columns {country} $countries [column search -all -among [column \
search -all -gt $pop_col 250000000] -gt $area_col 5e6]
→ China USA
![]() |
Column specified by position |
![]() |
Column specified by name |
returns names of countries that are populous and large in area. Note how the
outside search is limited to specific indices using the -among
option.
The column intersect3 command offers another way to search across multiple columns as described later.
![]() |
For more complex queries, it is more convenient to use the
Xtal extension instead of some combination
of
|
12. Sorting and ordering
Columns can be sorted using the column sort command or its
variable targeting analogue column vsort. The commands take the
-increasing
and -decreasing
options to determine the sort order.
The column sort command also takes the -indices option which
results in the indices being returned instead of the values
themselves. This is useful for sorting tables based on a
column. For example, assuming variable countries
has been
initialized as above,
% table get -list $countries [column sort -indices -nocase [table column \
$countries 0]]
→ {China 1350000000 9550000.0} {India 1250000000 3300000.0} {Russia 141930000 1...
returns rows in the sorted order based on country name.
When sorting tables, for display purposes for example, it is often necessary to display elements that have the same value in the sort column in the same order that they were previously displayed. Although, individual column sorts are stable, this is not enough when sorting across multiple columns. In such cases, the -indirect option to the sort command provides a solution. Using this option allows sorting where the "initial" ordering of elements is different from the actual order of elements in the column. An example will clarify this.
Consider a table that stores heights and weights.
% set tab [table create {name string height int weight int} {
{Jeff 180 80}
{John 175 80}
{Jim 170 75} }]
→ tarray_table {name height weight} {{tarray_column string {Jeff John Jim}} {ta...
The user may choose to sort the table by height which boils down to the following code:
% table get -list $tab [column sort -indices [table column $tab height]]
→ {Jim 170 75} {John 175 80} {Jeff 180 80}
This results in the table being displayed in the order Jim
, John
, Jeff
.
The user may then choose to sort by weight.
% table get -list $tab [column sort -indices [table column $tab weight]]
→ {Jim 170 75} {Jeff 180 80} {John 175 80}
resulting in a display in order Jim
, Jeff
, John
. Since they
actually have the same value in the new sort column, this
interchange of positions between Jeff
and John
is
disconcerting to the user. Use of the -indirect
option overcomes
this problem.
% set indices [column sort -indices [table column $tab height]]
→ tarray_column int {2 1 0}
% table get -list $tab $indices
→ {Jim 170 75} {John 175 80} {Jeff 180 80}
Now use previous order of indices to order elements when their values in the weight column are equal
% table get -list $tab [column sort -indirect [table column $tab weight] $indices]
→ {Jim 170 75} {John 175 80} {Jeff 180 80}
In this last statement, the sort is done indirectly using values from table but the positioning of elements when these values compare equal is based on the order in the original table.
Another form of reordering data is reversing the order of elements.
Both columns and tables support reverse
and vreverse
commands which
reverse the order of elements, an operation that is useful in many
algorithms.
% print [table column $tab name]
→ Jeff
John
Jim
% print [column reverse [table column $tab name]]
→ Jim
John
Jeff
13. Arithmetic operations
The column math command can be used to perform arithmetic operations on columns on a per-element basic. The command takes multiple arguments each of which may be a column or a scalar numeric value. For example,
% set I [column create int {10 20 30}]
→ tarray_column int {10 20 30}
% set J [column create double {1.1 2.2 3.3}]
→ tarray_column double {1.1 2.2 3.3}
% column math + $I $J 1000
→ tarray_column double {1011.1 1022.2 1033.3}
As a convenience, the above command can also be issued as
% column + $I $J 1000
→ tarray_column double {1011.1 1022.2 1033.3}
See the description of column math
for all the available operators.
In contrast to arithmetic commands that operate on a per-element basis, some commands operate on the entire column.
The column sum command sums all the elements in a column.
% column sum $J
→ 6.6
% column sum [table column $populations population]
→ 1350000850
The column minmax command returns a pair containing the minimum and maximum values in a column.
% column minmax [table column $populations population]
→ 850 1350000000
Note that this command is not restricted to numeric columns
and will work for other types as well. Also, it has the useful
-indices
option which returns the indices of the minimum and
maximum values instead of the values themselves.
% set indices [column minmax -indices [table column $populations population]]
→ 0 1
% column get -list [table column $populations country] $indices
→ Vatican China
14. Counting elements
The column size and The table size commands return the number of elements in a column or table.
% table size $populations
→ 2
If you are only interested in the count for elements that match specific criteria, you can use the column count command instead. Thus
% column count -gt [table column $populations population] 1000000000
→ 1
returns the number of countries with more than a billion people.
![]() |
Just as for searches, for more complex criteria, it is more convenient to use an Xtal query instead. |
15. Formatting
The Tcl puts
command is not always suitable for printing
the value of a column or table for several reasons. The output
is not formatted and hence difficult to read. The print
command
provides a alternative that outputs a more readable format.
% print [table column $countries population] -head 1 -tail 1
→ 850, ..., 141930000
% print $countries
→ +-------+----------+----------+
|country|population| area|
+-------+----------+----------+
|Vatican| 850| 0.44|
+-------+----------+----------+
...Additional lines omitted...
By default the command only prints the first few and last few elements although this can be controlled by various options.
The prettify
command is another alternative which
returns the formatted string instead of printing it to a channel.
16. Introspection
The type of a column can be retrieved with the column type
command.
% column type [table column $countries population]
→ wide
The table cnames
command returns a list containing the names
of the columns in a table.
% table cnames $countries
→ country population area
If the full table definition is desired, it can be retrieved with
table definition
. The returned string is in a form that can be
used with table create
.
% table definition $countries
→ country string population wide area double
17. Other commands
The column lookup command provides for a faster, dictionary based retrieval for string columns. This may be beneficial for columns used as keys in a table.
The column intersect3 command returns the differences between two columns. The multiple column search example above could also have written as follows:
% set pop_col [table column $countries population]
→ tarray_column wide {850 1350000000 314000000 1250000000 141930000}
% set area_col [table column $countries area]
→ tarray_column double {0.44 9550000.0 9630000.0 3300000.0 17000000.0}
% set populous [column search -all -gt $pop_col 250000000]
→ tarray_column int {1 2 3}
% set large [column search -all -gt $area_col 5e6]
→ tarray_column int {1 2 4}
% table get -list $countries [lindex [column intersect $populous $large] 0]
→ {China 1350000000 9550000.0} {USA 314000000 9630000.0}
In most cases, the previous method using column search -among
is
likely to be faster. However, the column intersect3
may be faster
in some cases, for example, when multiple combinations are
desired.
% lassign [column intersect3 $populous $large] populous_and_large \
populous_but_small sparse_but_large
18. Usage Hints
When specifying indices to commands, Tcl lists of integers and columns of
type int
are usually interchangeable. Similarly, when passing multiple
values to a command, either a Tcl list or a column of the appropriate type
can be used. Note there is an ambiguity in the specific case where the
target of the command is a column of type any
and the passed operand is a
string of the form tarray any {…}
where the operand can be interpreted either as a column or a Tcl list with
three elements tarray
, any
and the {…}
. In this case the operand
gets interpreted as a column.
Given that the tarray
extension is meant for dealing with
large amounts of data, it is useful to keep in mind Tcl’s
object reference counting and copy-on-write
implementation. Modifying a typed array that is shared will
result in a copy being made, which can be expensive if the
array is large. So, to modify a variable that contains a
typed array, the command
% column vput col [list 1.0 2.0]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0...
is far more efficient than
% set col [column put $col [list 1.0 2.0]]
→ tarray_column double {1.0 1.0 200.0 300.0 1.0 500.0 7.0 8.0 9.0 1.0 11.0 12.0...
assuming the value in is not itself shared. This is similar to use of Tcl’s
lset
command to modify lists.
As arrays get large tarray
prioritizes memory usage over efficiency. As
arrays grow, the additional extra memory is conservatively allocated
(unlike Tcl which aggressively allocates extra memory).
If the size of a typed array
can be estimated in advance, for example, reading records from a database,
the memory can be preallocated.
% column create int {} 1000000
→ tarray_column int {}
preallocates space for a million elements.
Typed arrays are by design implemented as consecutive elements in contiguous memory. Certain operations, such as insertion and deletion, will not be efficient when arrays get very large. For applications where such operations are common, other structures should be built on top using typed arrays as the lower level building blocks. Such higher level structures can be scripted and customized for specific usage patterns easily as they can be implemented at the script level using the low level typed array operations for efficiency. Whether this is required or not should be determined based on application benchmarks.
Both lists and columns have some differences in terms of functionality. Columns do not have the -stride option but the same functionality can be implemented through tables. List indexing offers nesting while although columns can be nested, the nested columns have to be explicitly accessed. On the other hand, columns offer some additional functions such as intersect3 and indexing operations (eg. extraction or storing of multiple elements through index lists).
any
Columns of type any
are stored as Tcl_Obj objects internally
and thus are very similar to Tcl lists. Any advantage of an any
typed array over using a simple Tcl list in terms of the memory
footprint comes only from conservative memory overallocation, not
from reduced memory size of individual elements. It is therefore
not as big a benefit as for other types. Thus columns of type
any
are mostly beneficial when used in conjunction with other
column types, for example in a table.
Note that columns of type string
are more efficient than type
any
for storing small strings.
The type any
can be any Tcl value, including typed
arrays. Typed arrays can therefore be nested (tables are currently
implemented as nested columns). However, unlike some of the Tcl
list commands, tarray
does not have commands that implicitly
support nesting. Nested typed arrays have to be explicitly
accessed as such.
column index [column index $outer_column 4] 0
The package internally keeps track of the sorting state of a column. A column is internally marked after certain operations where the result is known to be sorted. An obvious example is the column sort command. A less obvious case is an index column returned from certain search operations. Several commands make use of this for more efficient operation. For example, the column intersect3 command is much faster when columns are known to be sorted. Thus finding the intersection of two index columns resulting from searches is an O(n) operation.