Base drudge system

The base drudge system handles the part of program logic universally applicable to any tensor and noncommutative algebra system.

Building blocks of the basic drudge data structure

Symbolic ranges

class drudge.Range(label, lower=None, upper=None)[source]

A symbolic range that can be summed over.

This class is for symbolic ranges that is going to be summed over in tensors. Each range should have a label, and optionally lower and upper bounds, which should be both given or absent. The label can be any hashable and ordered Python type. The bounds will not be directly used for symbolic computation, but rather designed for printers and conversion to SymPy summation. Note that ranges are assumed to be atomic and disjoint. Even in the presence of lower and upper bounds, unequal ranges are assumed to be disjoint.

Warning

Bounds with the same label but different bounds will be considered unequal. Although no error be given, using different bounds with identical label is strongly advised against.

Warning

Unequal ranges are always assumed to be disjoint.

__init__(label, lower=None, upper=None)[source]

Initialize the symbolic range.

label

The label of the range.

lower

The lower bound of the range.

upper

The upper bound of the range.

size

The size of the range.

This property given None for unbounded ranges. For bounded ranges, it is the difference between the lower and upper bound. Note that this contradicts the deeply entrenched mathematical convention of including other ends for a range. But it does gives a lot of convenience and elegance.

bounded

If the range is explicitly bounded.

args

The arguments for range creation.

When the bounds are present, we have a triple, or we have a singleton tuple of only the label.

__hash__()[source]

Hash the symbolic range.

__eq__(other)[source]

Compare equality of two ranges.

__repr__()[source]

Form the representative string.

__str__()[source]

Form readable string representation.

sort_key

The sort key for the range.

replace_label(new_label)[source]

Replace the label of a given range.

The bounds will be the same as the original range.

__lt__(other)[source]

Compare two ranges.

This method is meant to skip explicit calling of the sort key when it is not convenient.

Noncommutative quantities

class drudge.Vec(label, indices=())[source]

Vectors.

Vectors are the basic non-commutative quantities. Its objects consist of an label for its base and some indices. The label is allowed to be any hashable and ordered Python object, although small objects, like string, are advised. The indices are always sympified into SymPy expressions.

Its objects can be created directly by giving the label and indices, or existing vector objects can be subscribed to get new ones. The semantics is similar to Haskell functions.

Note that users cannot directly assign to the attributes of this class.

This class can be used by itself, it can also be subclassed for special use cases.

Despite very different internal data structure, the this class is attempted to emulate the behaviour of the SymPy IndexedBase class

__init__(label, indices=())[source]

Initialize a vector.

Atomic indices are added as the only index. Iterable values will have all of its entries added.

label

The label for the base of the vector.

base

The base of the vector.

This base can be subscribed to get other vectors.

indices

The indices to the vector.

__getitem__(item)[source]

Append the given indices to the vector.

When multiple new indices are to be given, they have to be given as a tuple.

__repr__()[source]

Form repr string form the vector.

__str__()[source]

Form a more readable string representation.

__hash__()[source]

Compute the hash value of a vector.

__eq__(other)[source]

Compares the equality of two vectors.

sort_key

The sort key for the vector.

This is a generic sort key for vectors. Note that this is only useful for sorting the simplified terms and should not be used in the normal-ordering operations.

map(func)[source]

Map the given function to indices.

terms

Get the terms from the vector.

This is for the user input.

Single-term with noncommutative quantities and symbolic summations

class drudge.Term(sums: typing.Tuple[typing.Tuple[sympy.core.symbol.Symbol, drudge.term.Range], ...], amp: sympy.core.expr.Expr, vecs: typing.Tuple[drudge.term.Vec, ...], free_vars: typing.FrozenSet[sympy.core.symbol.Symbol] = None, dumms: typing.Mapping[sympy.core.symbol.Symbol, drudge.term.Range] = None)[source]

Terms in tensor expression.

This is the core class for storing symbolic tensor expressions. The actual symbolic tensor type is just a shallow wrapper over a list of terms. It is basically comprised of three fields, a list of summations, a SymPy expression giving the amplitude, and a list of non-commutative vectors.

__init__(sums: typing.Tuple[typing.Tuple[sympy.core.symbol.Symbol, drudge.term.Range], ...], amp: sympy.core.expr.Expr, vecs: typing.Tuple[drudge.term.Vec, ...], free_vars: typing.FrozenSet[sympy.core.symbol.Symbol] = None, dumms: typing.Mapping[sympy.core.symbol.Symbol, drudge.term.Range] = None)[source]

Initialize the tensor term.

Users seldom have the need to create terms directly by this function. So this constructor is mostly a developer function, no sanity checking is performed on the input for performance. Most importantly, this constructor does not copy either the summations or the vectors and directly expect them to be tuples (for hashability). And the amplitude is not simpyfied.

Also, it is important that the free variables and dummies dictionary be given only when they really satisfy what we got for them.

sums

The summations of the term.

amp

The amplitude expression.

vecs

The vectors in the term.

is_scalar

If the term is a scalar.

args

The triple of summations, amplitude, and vectors.

__hash__()[source]

Compute the hash of the term.

__eq__(other)[source]

Evaluate the equality with another term.

__repr__()[source]

Form the representative string of a term.

__str__()[source]

Form the readable string representation of a term.

sort_key

The sort key for a term.

This key attempts to sort the terms by complexity, with simpler terms coming earlier. This capability of sorting the terms will make the equality comparison of multiple terms easier.

This sort key also ensures that terms that can be merged are always put into adjacent positions.

terms

The singleton list of the current term.

This property is for the rare cases where direct construction of tensor inputs from SymPy expressions and vectors are not sufficient.

scale(factor)[source]

Scale the term by a factor.

mul_term(other, dumms=None, excl=None)[source]

Multiply with another tensor term.

Note that by this function, the free symbols in the two operands are not automatically excluded.

comm_term(other, dumms=None, excl=None)[source]

Commute with another tensor term.

In ths same way as the multiplication operation, here the free symbols in the operands are not automatically excluded.

reconcile_dumms(other, dumms, excl)[source]

Reconcile the dummies in two terms.

exprs

Loop over the sympy expression in the term.

Note that the summation dummies are not looped over.

free_vars

The free symbols used in the term.

dumms

Get the mapping from dummies to their range.

amp_factors

The factors in the amplitude expression.

This is a convenience wrapper over get_amp_factors() for the case of no special additional symbols.

get_amp_factors(*special_symbs)[source]

Get the factors in the amplitude and the coefficient.

The indexed factors and factors involving dummies or the symbols in the given special symbols set will be returned as a list, with the rest returned as a single SymPy expression.

Error will be raised if the amplitude is not a monomial.

map(func=<function Term.<lambda>>, sums=None, amp=None, vecs=None, skip_vecs=False)[source]

Map the given function to the SymPy expressions in the term.

The given function will not be mapped to the dummies in the summations. When operations on summations are needed, a tuple for the new summations can be given.

By the default function of the identity function, this function can also be used to replace the summation list, the amplitude expression, or the vector part.

subst(substs, sums=None, amp=None, vecs=None, purge_sums=False)[source]

Perform symbol substitution on the SymPy expressions.

After the replacement of the fields given, the given substitutions are going to be performed using SymPy xreplace method simultaneously.

If purge sums is set, the summations whose dummy is substituted is going to be removed.

reset_dumms(dumms, dummbegs=None, excl=None, add_substs=None)[source]

Reset the dummies in the term.

The term with dummies reset will be returned alongside with the new dummy begins dictionary. Note that the dummy begins dictionary will be mutated if one is given.

ValueError will be raised when no more dummies are available.

static reset_sums(sums, dumms, dummbegs=None, excl=None)[source]

Reset the given summations.

The new summation list, substitution dictionary, and the new dummy begin dictionary will be returned.

simplify_deltas(resolvers)[source]

Simplify deltas in the amplitude of the expression.

simplify_sums()[source]

Simplify the summations in the term.

expand()[source]

Expand the term into many terms.

canon(symms=None, vec_colour=None)[source]

Canonicalize the term.

The given vector colour should be a callable accepting the index within vector list (under the keyword idx) and the vector itself (under keyword vec). By default, vectors has colour the same as its index within the list of vectors.

Note that whether or not colours for the vectors are given, the vectors are never permuted in the result.

canon4normal(symms)[source]

Canonicalize the term for normal-ordering.

This is the preparation task for normal ordering. The term will be canonicalized with all the vectors considered the same. And the dummies will be reset internally according to the summation list.

has_base(base)[source]

Test if the given base is present in the current term.

Canonicalization of indexed quantities with symmetry

Some actions are supported to accompany the permutation of indices to indexed quantities. All of these accompanied action can be composed by using the bitwise or operator |.

drudge.IDENT

The identitiy action. Nothing is performed for the permutation.

drudge.NEG

Negation. When the given permutation is performed, the indexed quantity needs to be negated. For instance, in anti-symmetric matrix.

drudge.CONJ

Conjugation. When the given permutation is performed, the indexed quantity needs to be taken it complex conjugate. Note that this action can only be used in the symmetry of scalar indexed quantities.

class drudge.Perm

Permutation of points with accompanied action.

Permutations can be constructed from an iterable giving the pre-image of the points and an optional integral value for the accompanied action. The accompanied action can be given positionally or by the keyword acc, and it will be manipulated according to the convention in libcanon.

Querying the length of a Perm object gives the size of the permutation domain, while indexing it gives the pre-image of the given integral point. The accompanied action can be obtained by getting the attribute acc. Otherwise, this data type is mostly opaque.

acc

The accompanied action.

class drudge.Group

Permutations groups.

To create a permutation group, an iterable of Perm objects or pre-image array action pair can be given for the generators of the group. Then the Schreier-Sims algorithm in libcanon will be invoked to generate the Sims transversal system, which will be stored internally for the group. This class is mostly designed to be used to give input for the Eldag canonicalization facility. So it is basically an opaque object after its creation.

Internally, the transversal system can also be constructed directly from the transversal system, without going through the Schreier-Sims algorithm. However, that is more intended for serialization rather than direct user invocation.

Primary interface

The primary drudge class

class drudge.Drudge(ctx: pyspark.context.SparkContext, num_partitions=True)[source]

The main drudge class.

A drudge is a robot who can help you with the menial tasks of symbolic manipulation for tensorial and noncommutative alegbras. Due to the diversity and non-uniformity of tensor and noncommutative algebraic problems, to set up a drudge, domain-specific information about the problem needs to be given. Here this is a base class, where the basic operations are defined. Different problems could subclass this base class with customized behaviour. Most importantly, the method normal_order() should be overridden to give the commutation rules for the algebraic system studied.

__init__(ctx: pyspark.context.SparkContext, num_partitions=True)[source]

Initialize the drudge.

Parameters:
  • ctx – The Spark context to be used.
  • num_partitions – The preferred number of partitions. By default, it is the default parallelism of the given Spark environment. Or an explicit integral value can be given. It can be set to None, which disable all explicit load-balancing by shuffling.
ctx

The Spark context of the drudge.

num_partitions

The preferred number of partitions for data.

full_simplify

If full simplification is to be performed on amplitudes.

It can be used to disable full simplification of the amplitude expression by SymPy. For simple polynomial amplitude, this option is generally safe to be disabled.

simple_merge

If only simple merge is to be carried out.

When it is set to true, only terms with same factors involving dummies are going to be merged. This might be helpful for cases where the amplitude are all simple polynomials of tensorial quantities. Note that this could disable some SymPy simplification.

Warning

This option might not give much more than disabling full simplification but taketh away many simplifications. It is in general not recommended to be used.

default_einst

If def_() takes Einstein convention.

This property tunes the behaviour of def_(). When it is set, the Einstein summation convention is always assumed for the right-hand side for that function.

form_base_name(tensor_def: drudge.drudge.TensorDef) → typing.Union[str, NoneType][source]

Form the name for the base to use for tensor definitions.

This method is called by set_name() to get a formatted string for the base of the tensor definition, which is to be used as the name for the base in the name archive. None can be returned to stop the base from being added.

By default, an underscore is put in front of the string form of the base.

form_def_name(tensor_def: drudge.drudge.TensorDef) → typing.Union[str, NoneType][source]

Form the name for a tensor definition in name archive.

The result will be used by set_name() as the name of the tensor definition itself in the name archive. By default, it is set just to be plain string form of the base of the definition.

set_name(*args, **kwargs)[source]

Set objects into the name archive of the drudge.

For positional arguments, the str form of the given label is going to be used for the name of the object. Special treatment is given to tensor definitions, the base and and definition itself will be added under names given by the methods form_base_name(), and form_def_name().

For keyword arguments, the keyword will be used for the name.

unset_name(*args, **kwargs)[source]

Unset names from name archive.

This method is mostly used to undo the effect of set_name(). Here, names that are not actually present in the name archive will be skipped without error.

names

The name archive for the drudge.

The name archive object can be used for convenient accessing of objects related to the problem.

inject_names(prefix='', suffix='')[source]

Inject the names in the name archive into the current global scope.

This function is for the convenience of users, especially interactive users. Itself is not used in official drudge code except its own tests.

Note that this function injects the names in the name archive into the global scope of the caller, rather than the local scope, even when called inside a function.

set_dumms(range_: drudge.term.Range, dumms, set_range_name=True, dumms_suffix='_dumms', set_dumm_names=True)[source]

Set the dummies for a range.

Note that this function overwrites the existing dummies if the range has already been given.

dumms

The broadcast form of the dummies dictionary.

set_symm(base, *symms, valence=None, set_base_name=True)[source]

Set the symmetry for a given base.

Permutation objects in the arguments are interpreted as single generators, other values will be attempted to be iterated over to get their entries, which should all be permutations.

Parameters:
  • base – The SymPy indexed base object or vectors whose symmetry is to be set. Their label can be used as well.
  • symms – The generators of the symmetry. It can be a single None to remove the symmetry of the given base.
  • valence (int) – When it is set, only the indexed quantity of the base with the given valence will have the given symmetry.
  • set_base_name – If the base name is to be added to the name archive of the drudge.
symms

The broadcast form of the symmetries.

add_resolver(resolver)[source]

Append a resolver to the list of resolvers.

The given resolver can be either a mapping from SymPy expression, including atomic symbols, to the corresponding ranges. Or a callable to be called with SymPy expressions. For callable resolvers, None can be returned to signal the incapability to resolve the expression. Then the resolution will be dispatched to the next resolver.

add_resolver_for_dumms()[source]

Add the resolver for the dummies for each range.

With this method, the default dummies for each range will be resolved to be within the range for all of them. This method should normally be called by all subclasses after the dummies for all ranges have been properly set.

Note that dummies added later will not be automatically added. This method can be called again.

add_default_resolver(range_)[source]

Add a default resolver.

The default resolver will resolve any expression to the given range. Note that all later resolvers will not be invoked at all after this resolver is added.

resolvers

The broadcast form of the resolvers.

set_tensor_method(name, func)[source]

Set a new tensor method under the given name.

A tensor method is a method that can be called from tensors created from the current drudge as if it is a method of the given tensor. This could give cleaner and more consistent code for all tensor manipulations.

The given function, or bounded method, should be able to accept the tensor as the first argument.

get_tensor_method(name)[source]

Get a tensor method with given name.

When the name cannot be resolved, KeyError will be raised.

vec_colour

The vector colour function.

Note that this accessor accesses the function, rather than directly computes the colour for any vector.

normal_order(terms, **kwargs)[source]

Normal order the terms in the given tensor.

This method should be called with the RDD of some terms, and another RDD of terms, where all the vector parts are normal ordered according to domain-specific rules, should be returned.

By default, we work for the free algebra. So nothing is done by this function. For noncommutative algebraic system, this function needs to be overridden to return an RDD for the normal-ordered terms from the given terms.

sum(*args, predicate=None) → drudge.drudge.Tensor[source]

Create a tensor for the given summation.

This is the core function for creating tensors from scratch. The arguments should start with the summations, each of which should be given as a sequence, normally a tuple, starting with a SymPy symbol for the summation dummy in the first entry. Then comes possibly multiple domains that the dummy is going to be summed over, which can be symbolic range, SymPy expression, or iterable over them. When symbolic ranges are given as Range objects, the given dummy will be set to be summed over the ranges symbolically. When SymPy expressions are given, the given values will substitute all appearances of the dummy in the summand. When we have multiple summations, terms in the result are generated from the Cartesian product of them.

The last argument should give the actual thing to be summed, which can be something that can be interpreted as a collection of terms, or a callable that is going to return the summand when given a dictionary giving the action on each of the dummies. The dictionary has an entry for all the dummies. Dummies summed over symbolic ranges will have the actual range as its value, or the actual SymPy expression when it is given a concrete range. In the returned summand, if dummies still exist, they are going to be treated in the same way as statically-given summands.

The predicate can be a callable going to return a boolean when called with same dictionary. False values can be used the skip some terms. It is guaranteed that the same dictionary will be used for both predicate and the summand when they are given as callables.

For instance, mostly commonly, we can create a tensor by having simple summations over symbolic ranges,

>>> dr = Drudge(SparkContext())
>>> r = Range('R')
>>> a = Symbol('a')
>>> b = Symbol('b')
>>> x = IndexedBase('x')
>>> v = Vec('v')
>>> tensor = dr.sum((a, r), (b, r), x[a, b] * v[a] * v[b])
>>> str(tensor)
'sum_{a, b} x[a, b] * v[a] * v[b]'

And we can also give multiple symbolic ranges for a single dummy to sum over all of them,

>>> s = Range('S')
>>> tensor = dr.sum((a, r, s), x[a] * v[a])
>>> print(str(tensor))
sum_{a} x[a] * v[a]
 + sum_{a} x[a] * v[a]

When the objects to sum over are not symbolic ranges, we are in the concrete summation mode, for instance,

>>> tensor = dr.sum((a, 1, 2), x[a] * v[a])
>>> print(str(tensor))
x[1] * v[1]
 + x[2] * v[2]

The concrete and symbolic summation mode can be put together freely in the same summation,

>>> tensor = dr.sum((a, r, s), (b, 1, 2), x[b, a] * v[a])
>>> print(str(tensor))
sum_{a} x[1, a] * v[a]
 + sum_{a} x[2, a] * v[a]
 + sum_{a} x[1, a] * v[a]
 + sum_{a} x[2, a] * v[a]

Note that this function can also be called on existing tensor objects with the same semantics on the terms. Existing summations are not touched by it. For instance,

>>> tensor = dr.sum(x[a] * v[a])
>>> str(tensor)
'x[a] * v[a]'
>>> tensor = dr.sum((a, r), tensor)
>>> str(tensor)
'sum_{a} x[a] * v[a]'

where we have used summation with only summand (no sums) to create simple tensor of only one term without any summation.

einst(summand) → drudge.drudge.Tensor[source]

Create a tensor from Einstein summation convention.

By calling this function, summations according to the Einstein summation convention will be added to the terms. Note that for a symbol to be recognized as a summation, it must appear exactly twice in its original form in indices, and its range needs to be able to be resolved. When a symbol is suspiciously an Einstein summation dummy but does not satisfy the requirement precisely, it will not be added as a summation, but a warning will also be given for reference.

For instance, we can have the following fairly conventional Einstein form,

>>> dr = Drudge(SparkContext())
>>> r = Range('R')
>>> a, b, c = dr.set_dumms(r, symbols('a b c'))
>>> dr.add_resolver_for_dumms()
>>> x = IndexedBase('x')
>>> tensor = dr.einst(x[a, b] * x[b, c])
>>> str(tensor)
'sum_{b} x[a, b]*x[b, c]'

However, when a dummy is not in the most conventional form, the summations cannot be automatically added. For instance,

>>> tensor = dr.einst(x[a, b] * x[b, b])
>>> str(tensor)
'x[a, b]*x[b, b]'

b is not summed over since it is repeated three times. Note also that the symbol must be able to be resolved its range for it to be summed automatically.

Note that in addition to creating tensors from scratch, this method can also be called on an existing tensor to add new summations. In that case, no existing summations will be touched.

create_tensor(terms)[source]

Create a tensor with the terms given in the argument.

The terms should be given as an iterable of Term objects. This function should not be necessary in user code.

define(*args) → drudge.drudge.TensorDef[source]

Make a tensor definition.

This is a helper method for the creation of TensorDef instances.

Parameters:
  • arguments (initial) – The left-hand side of the definition. It can be given as an indexed quantity, either SymPy Indexed instances or an indexed vector, with all the indices being plain symbols whose range is able to be resolved. Or a base can be given, followed by the symbol/range pairs for the external indices.
  • argument (final) – The definition of the LHS, can be tensor instances, or anything capable of being interpreted as such. Note that no summation is going to be automatically added.
define_einst(*args) → drudge.drudge.TensorDef[source]

Make a tensor definition based on Einstein summation convention.

Basically the same function as the define(), just the content will be interpreted according to the Einstein summation convention.

def_(*args) → drudge.drudge.TensorDef[source]

Make a tensor definition according to convention set in drudge.

This method is a convenient utility for making tensor definitions. Basically it calls define() or define_einst() according to the value of the property default_einst().

It is also the operations used for tensor definition operations inside drudge scripts.

format_latex(inp, sep_lines=False, align_terms=False, proc=None, no_sum=False, scalar_mul='')[source]

Get the LaTeX form of a given tensor or tensor definition.

Subclasses should fine-tune the appearance of the resulted LaTeX form by overriding methods _latex_sympy, _latex_vec, and _latex_vec_mul.

Parameters:
  • inp – The input tensor or tensor definition.
  • sep_lines – If terms should be put into separate lines by separating them with \\.
  • align_terms – If & is going to be prepended to each term to have them aligned. This option is intended for cases where the LaTeX form is going to be put inside environments supporting alignment.
  • proc – A callable to be called with the string of the original LaTeX formatting of each of the terms to return a processed final form. The callable is also going to be given keyword arguments term for the actual tensor term and idx for the index of the term within the tensor.
  • no_sum (bool) – If summation is going to be suppressed in the printing, useful for cases where a convention, like the Einstein’s, exists for the summations.
  • scalar_mul (str) – The text for scalar multiplication. By default, scalar multiplication is just rendered as juxtaposition. When a string is given for this argument, it is going to be placed between scalar factors and between the amplitude and the vectors. In LaTeX output of tensors with terms with many factors, special command \invismult can be used, which just makes a small space but enables the factors to be automatically split by the breqn package.
report(filename, title)[source]

Make a report for results.

This function should be used within a with statement to open a report (Report) for results.

Parameters:
  • filename (str) – The name of the report file, whose extension gives the type of the report. Currently, .html gives reports in HTML format, where the MathJAX library is used for rendering the math. .tex gives reports in LaTeX format, while .pdf will automatically compile the LaTeX source by program pdflatex in the path. Normally for LaTeX output, finer tuning of the display environment in Report.add() is required, especially for long equations.
  • title – The title to be printed in the report.

Examples

>>> dr = Drudge(SparkContext())
>>> tensor = dr.sum(IndexedBase('x')[Symbol('a')])
>>> with dr.report('report.html', 'A simple tensor') as report:
...     report.add('Simple tensor', tensor)
pickle_env()[source]

Prepare the environment for unpickling contents with tensors.

Pickled contents containing tensors cannot be directly unpickled by the functions from the pickle module directly. They should be used within the context created by this function. Note that the content does not have to have a single tensor object. Anything containing tensor objects needs to be loaded within the context.

Note that this context is unnecessary inside drudge scripts.

Warning

All tensors read within this environment will have the current drudge as their drudge. No checking is, or can be, done to make sure that the tensors are sensible for the current drudge. Normally it should be the same drudge as the drudge used for their creation be used.

Examples

>>> dr = Drudge(SparkContext())
>>> tensor = dr.sum(IndexedBase('x')[Symbol('a')])
>>> import pickle
>>> serialized = pickle.dumps(tensor)
>>> with dr.pickle_env():
...     res = pickle.loads(serialized)
>>> print(tensor == res)
True
memoize(comput, filename, log=None, log_header='Memoize:')[source]

Preserve/lookup result of computation into/from pickle file.

When the file with the given name exists, it will be opened and attempted to be unpickled, with the deserialized content returned and the given computation skipped. When the file is absent or does not contain valid pickle, the given computation will be performed, with the result both pickled into a file created with the given name and returned.

Parameters:
  • comput – The callable giving the computation to be performed. To be called with no arguments.
  • filename – The name of the pickle file to read from or write to.
  • log – The file object to write log information to. None if no logging is desired, True if they are to be written to the standard output, or any writable file object can be given.
  • log_header – The header to be prepended to lines of the log texts.
Returns:

  • The result of the computation, either read from existing file or newly
  • computed.

Examples

>>> dr = Drudge(SparkContext())
>>> res = dr.memoize(lambda: 10, 'intermediate.pickle')
>>> res
10
>>> dr.memoize(lambda: 10, 'intermediate.pickle')
10

Note that in the second execution, the number 10 should be read from the file rather than being computed again. Normally, rather than a trivial number, expensive intermediate results can be memoized in this way so that the script can be restarted readily.

inside_drs

If we are currently inside a drudge script.

__weakref__

list of weak references to the object (if defined)

exec_drs(src, filename='<unknown>')[source]

Execute the drudge script.

Drudge script are Python scripts tweaked to be executed in special environments. This domain-specific language is made for the convenience users for simple tasks, especially for users unfamiliar with Python.

Being a Python script executed inside the current interpreter, drudge script differs from normal Python scripts by

  1. All integer literal are resolved into SymPy symbolic integers.
  2. Global names are resolved in the order of,
    • the name archive in the current drudge,
    • the special drudge script functions in the drudge,
    • the drudge package exported names,
    • the gristmill package exported names (if installed),
    • the SymPy exported names,
    • built-in Python names.
  3. All unresolved names are created as a special kind of symbolic object, which behaves basically like SymPy Symbol, but with differences,
    1. They are be directly subscripted, like IndexedBase.
    2. def_as method can be used to make a tensor definition with such symbols or its indexing on the left-hand side, the other operand on its right-hand side. The resulted definition is also added to the name archive of the drudge.
    3. <= operator can be used similar to def_as, except the definition is not added to the archive. The result can be put into local variables.
  4. All left-shift augmented assignment <<= operations are replaced by def_as method calling.
  5. Some operations could have slightly different behaviour more suitable inside drudge scripts. For developers, the inside_drs() property can be used to query if the function is called inside a drudge script.
  6. The pickling environment is set.

For a non-technical introduction to drudge script, please see Drudge scripts.

static simplify(arg, **kwargs)[source]

Make simplification for both SymPy expressions and tensors.

This method is mostly designed to be used in drudge scripts. The actual simplification is dispatched based on the type of the given argument. Simple SymPy simplification for SymPy expressions, drudge simplification for drudge tensors or tensor definitions.

Tensors

class drudge.Tensor(drudge: drudge.drudge.Drudge, terms: pyspark.rdd.RDD, free_vars: typing.Set[sympy.core.symbol.Symbol] = None, expanded=False, repartitioned=False)[source]

The main tensor class.

A tensor is an aggregate of terms distributed and managed by Spark. Here most operations needed for tensors are defined.

Normally, tensor instances are created from drudge methods or tensor operations. Direct invocation of its constructor is seldom in user scripts.

__init__(drudge: drudge.drudge.Drudge, terms: pyspark.rdd.RDD, free_vars: typing.Set[sympy.core.symbol.Symbol] = None, expanded=False, repartitioned=False)[source]

Initialize the tensor.

This function is not designed to be called by users directly. Tensor creation should be carried out by factory function inside drudges and the operations defined here.

The default values for the keyword arguments are always the safest choice, for better performance, manipulations are encouraged to have proper consideration of all the keyword arguments.

drudge

The drudge created the tensor.

terms

The terms in the tensor, as an RDD object.

Although for users, normally there is no need for direct manipulation of the terms, it is still exposed here for flexibility.

local_terms

Gather the terms locally into a list.

The list returned by this is for read-only and should never be mutated.

Warning

This method will gather all terms into the memory of the driver.

n_terms

Get the number of terms.

A zero number of terms signatures a zero tensor. Accessing this property will make the tensor to be cached automatically.

cache()[source]

Cache the terms in the tensor.

This method should be called when this tensor is an intermediate result that will be used multiple times. The tensor itself will be returned for the ease of chaining.

repartition(num_partitions=None, cache=False)[source]

Repartition the terms across the Spark cluster.

This function should be called when the terms need to be rebalanced among the workers. Note that this incurs an Spark RDD shuffle operation and might be very expensive. Its invocation and the number of partitions used need to be fine-tuned for different problems to achieve good performance.

Parameters:
  • num_partitions (int) – The number of partitions. By default, the number is read from the drudge object.
  • cache (bool) – If the result is going to be cached.
is_scalar

If the tensor is a scalar.

A tensor is considered a scalar when none of its terms has a vector part. This property will make the tensor automatically cached.

free_vars

The free variables in the tensor.

expanded

If the tensor is already expanded.

repartitioned

If the terms in the tensor is already repartitioned.

has_base(base: typing.Union[sympy.tensor.indexed.IndexedBase, sympy.core.symbol.Symbol, drudge.term.Vec]) → bool[source]

Find if the tensor has the given scalar or vector base.

Parameters:base – The base whose presence is to be queried. When it is indexed base or a plain symbol, its presence in the amplitude part is tested. When it is a vector, its presence in the vector part is tested.
__repr__()[source]

Get the machine string representation.

In normal execution environment, only the memory address is displayed, since the tensor may or may not be evaluated yet. In drudge scripts, the readable string representation is returned.

__str__()[source]

Get the string representation of the tensor.

Note that this function will gather all terms into the driver.

latex(**kwargs)[source]

Get the latex form for the tensor.

The actual printing is dispatched to the drudge object for the convenience of tuning the appearance.

All keyword arguments are forwarded to the Drudge.format_latex() method.

display(if_return=True, **kwargs)[source]

Display the tensor in interactive IPython notebook sessions.

Parameters:
  • if_return – If the resulted equation be returned rather than directly displayed. It can be disabled for displaying equation in the middle of a Jupyter cell.
  • kwargs – All the rest of the keyword arguments are forwarded to the Drudge.format_latex() method.
__getstate__()[source]

Get the current state of the tensor.

Here we just have the local terms. Other cached information are discarded.

__setstate__(state)[source]

Set the state for the new tensor.

This function reads the drudge to use from the module attribute, which is set in the Drudge.pickle_env() method.

apply(func, **kwargs)[source]

Apply the given function to the RDD of terms.

This function is analogous to the replace function of Python named tuples, the same value from self for the tensor initializer is going to be used when it is not given. The terms get special treatment since it is the centre of tensor objects. The drudge is kept the same always.

Users generally do not need this method. It is exposed here just for flexibility and convenience.

Warning

For developers: Note that the resulted tensor will inherit all unspecified keyword arguments from self. This method can give unexpected results if certain arguments are not correctly reset when they need to. For instance, when expanded is not reset when the result is no longer guaranteed to be in expanded form, later expansions could be skipped when they actually need to be performed.

So all functions using this methods need to be reviewed when new property are added to tensor class. Direct invocation of the tensor constructor is a much safe alternative.

reset_dumms(excl=None)[source]

Reset the dummies.

The dummies will be set to the canonical dummies according to the order in the summation list. This method is especially useful on canonicalized tensors.

Parameters:excl – A set of symbols to be excluded in the dummy selection. This option can be useful when some symbols already used as dummies are planned to be used for other purposes.
simplify_amps()[source]

Simplify the amplitudes in the tensor.

This method simplifies the amplitude in the terms of the tensor by using the facility from SymPy. The zero terms will be filtered out as well.

simplify_deltas()[source]

Simplify the deltas in the tensor.

Kronecker deltas whose operands contains dummies will be attempted to be simplified.

simplify_sums()[source]

Simplify the summations in the tensor.

Currently, only bounded summations with dummies not involved in the term will be replaced by a multiplication with its size.

expand()[source]

Expand the terms in the tensor.

By calling this method, terms in the tensor whose amplitude is the addition of multiple parts will be expanded into multiple terms.

sort()[source]

Sort the terms in the tensor.

The terms will generally be sorted according to increasing complexity.

merge()[source]

Merge terms with the same vector and summation part.

This function merges terms only when their summation list and vector part are syntactically the same. So it is more useful when the canonicalization has been performed and the dummies reset.

canon()[source]

Canonicalize the terms in the tensor.

This method will first expand the terms in the tensor. Then the canonicalization algorithm is going to be applied to each of the terms. Note that this method does not rename the dummies.

normal_order()[source]

Normal order the terms in the tensor.

The actual work is dispatched to the drudge, who has domain specific knowledge about the noncommutativity of the vectors.

simplify()[source]

Simplify the tensor.

This is the master driver function for tensor simplification. Inside drudge scripts, it also make eager evaluation and repartition the terms among the Spark workers, with the result cached. This is for the ease of users unfamiliar with the Spark lazy execution model.

__eq__(other)[source]

Compare the equality of tensors.

Note that this function only compares the syntactical equality of tensors. Mathematically equal tensors might be compared to be unequal by this function when they are not simplified.

Note that only comparison with zero is performed by counting the number of terms distributed. Or this function gathers all terms in both tensors and can be very expensive. So direct comparison of two tensors is mostly suitable for testing and debugging on small problems only. For large scale problems, it is advised to compare the simplified difference with zero.

__add__(other)[source]

Add the two tensors together.

The terms in the two tensors will be concatenated together, without any further processing.

In addition to full tensors, tensor inputs can also be directly added.

__radd__(other)[source]

Add tensor with something in front.

__sub__(other)[source]

Subtract another tensor from this tensor.

__rsub__(other)[source]

Subtract the tensor from another quantity.

__neg__()[source]

Negate the current tensor.

The result will be equivalent to multiplication with \(-1\).

__mul__(other) → drudge.drudge.Tensor[source]

Multiply the tensor.

This multiplication operation is done completely within the framework of free algebras. The vectors are only concatenated without further processing. The actual handling of the commutativity should be carried out at the normal ordering operation for different problems.

In addition to full tensors, tensors can also be multiplied to user tensor input directly.

__rmul__(other)[source]

Multiply the tensor on the right.

__or__(other)[source]

Compute the commutator with another tensor.

In the same way as multiplication, this can be used for both full tensors and local tensor input.

__ror__(other)[source]

Compute the commutator with another tensor on the right.

__truediv__(other)[source]

Divide tensor by a scalar quantity.

__rtruediv__(other)[source]

Make division over a tensor.

subst(lhs, rhs, wilds=None, full_balance=False, excl=None)[source]

Substitute the all appearance of the defined tensor.

When the given LHS is a plain SymPy symbol, all its appearances in the amplitude of the tensor will be replaced. Or the LHS can also be indexed SymPy expression or indexed Vector, for which all of the appearances of the indexed base or vector base will be attempted to be matched against the indices on the LHS. When a matching succeeds for all the indices, the RHS, with the substitution found in the matching performed, will be replace the indexed base in the amplitude, or the vector. Note that for scalar LHS, the RHS must contain no vector.

Since we do not commonly define tensors with wild symbols, an option wilds can be used to give a mapping translating plain symbols on the LHS and the RHS to the wild symbols that would like to be used. The default value of None could make all plain symbols in the indices of the LHS to be translated into a wild symbol with the same name and no exclusion. And empty dictionary can be used to disable all such automatic translation. The default value of None should satisfy most needs.

Examples

For instance, we can have a very simple tensor, the outer product of the same vector,

>>> dr = Drudge(SparkContext())
>>> r = Range('R')
>>> a, b = dr.set_dumms(r, symbols('a b c d e f'))[:2]
>>> dr.add_default_resolver(r)
>>> x = IndexedBase('x')
>>> v = Vec('v')
>>> tensor = dr.einst(x[a] * x[b] * v[a] * v[b])
>>> str(tensor)
'sum_{a, b} x[a]*x[b] * v[a] * v[b]'

We can replace the indexed base by the product of a matrix with another indexed base,

>>> o = IndexedBase('o')
>>> y = IndexedBase('y')
>>> res = tensor.subst(x[a], dr.einst(o[a, b] * y[b]))
>>> str(res)
'sum_{a, b, c, d} y[c]*y[d]*o[a, c]*o[b, d] * v[a] * v[b]'

We can also make substitution on the vectors,

>>> w = Vec('w')
>>> res = tensor.subst(v[a], dr.einst(o[a, b] * w[b]))
>>> str(res)
'sum_{a, b, c, d} x[a]*x[b]*o[a, c]*o[b, d] * w[c] * w[d]'

After the substitution, we can always make a simplification, at least to make the naming of the dummies more aesthetically pleasing,

>>> res = res.simplify()
>>> str(res)
'sum_{a, b, c, d} x[c]*x[d]*o[c, a]*o[d, b] * w[a] * w[b]'
subst_all(defs, simplify=False, full_balance=False, excl=None)[source]

Substitute all given definitions serially.

The definitions should be given as an iterable of either TensorDef instances or pairs of left-hand side and right-hand side of the substitutions. Note that the substitutions are going to be performed according to the given order one-by-one, rather than simultaneously.

rewrite(vecs, new_amp)[source]

Rewrite terms with the given vectors in terms of the new amplitude.

This method will rewrite the terms whose vector part patches the given vectors in terms of the given new amplitude. And all terms rewritten into the same form will be aggregated into a single term.

Parameters:
  • vecs – A vector or a product of vectors. They should be written in terms of SymPy wild symbols when they need to be matched against different actual vectors.
  • new_amp – The amplitude that the matched terms should have. They are usually written in terms of the same wild symbols as the wilds in the vectors.
Returns:

  • rewritten – The tensor with the requested terms rewritten in term of the given amplitude.
  • defs – The actual definitions of the rewritten amplitude. One for each rewritten term in the result.

diff(variable, real=False, wirtinger_conj=False)[source]

Differentiate the tensor to get the analytic gradient.

By this function, support is provided for evaluating the derivative with respect to either a plain symbol or a tensor component. This is achieved by leveraging the core differentiation operation to SymPy. So very wide range of expressions are supported.

Warning

For non-analytic complex functions, this function gives the Wittinger derivative with respect to the given variable only. The other non-vanishing derivative with respect to the conjugate needs to be evaluated by another invocation with wittinger_conj set to true.

Warning

The differentiation algorithm currently does not take the symmetry of the tensor to be differentiated with respect to into account. For differentiate with respect to symmetric tensor, further symmetrization of the result might be needed.

Parameters:
  • variable – The variable to differentiate with respect to. It should be either a plain SymPy symbol or a indexed quantity. When it is an indexed quantity, the indices should be plain symbols with resolvable range.
  • real (bool) – If the variable is going to be assumed to be real. Real variables has conjugate equal to themselves.
  • wirtinger_conj (bool) – If we evaluate the Wirtinger derivative with respect to the conjugate of the variable.
filter(crit)[source]

Filter out terms satisfying the given criterion.

The criterion needs to be a callable accepting Term objects. In the result, only terms for which the given criterion function evaluates to True will be retained.

map(func)[source]

Map the given function to the terms in the tensor.

The given function should take a Term object and return a Term object. The resulted tensor has the result of the application of the given callable to each of the terms as its terms.

bind(func)[source]

Map the given function to the terms and flatten.

The given callable need to return an iterable of Term objects for each of the terms in the current tensor. It is going to be applied to each of the terms, all the terms from all the results is going to form the terms of the resulted tensor. This is the bind operation for the monad of lists in Haskell, or flatMap in Java/Scala/Spark collection API.

filter() and map() can be understood as special case of this method.

map2scalars(action, skip_vecs=False)[source]

Map the given action to the scalars in the tensor.

The given action should return SymPy expressions for SymPy expressions, the amplitude for each terms and the indices to the vectors, in the tensor. Note that this function is more supposed for free variables and does not change the summations in the terms and the dummies.

Parameters:
  • action – The callable to be applied to the scalars inside the tensor.
  • skip_vecs – When it is set, the callable will no longer be mapped to the indices to the vectors. It could be used to boost the performance when we know that the action need no application on the indices.
__getattr__(item)[source]

Try to see if the item is a tensor method from the drudge.

This enables individual drudges to dynamically add domain-specific operations on tensors.

Tensor definitions

class drudge.TensorDef(base, exts, tensor: drudge.drudge.Tensor)[source]

Definition of a tensor.

A tensor definition is basically a tensor with a name. In additional to being a tensor, a tensor definition also has a left-hand side. When the tensor is zero-order, the left-hand side is simply a symbol. When it has external indices, the base and external indices for the it are both stored. The base is a Vec instance for tensors with vector part or it is an SymPy IndexedBase for scalar tensors. For instance,

\[\sum_j o_{i, j} f_j\]

can be construed as a tensor. By storing it as an Tensor object, we can have mathematical manipulations on it. With an explicit left-hand side,

\[t_j = \sum_j o_{i, j} f_j\]

it is now an tensor definition, which can be handled by the current class.

A tensor definition is a subclass of tensor. With explicit storage of a left-hand side, it can be convenient to be used for Tensor.subst() or Tensor.subst_all() method, or it can be directly indexed. For example, with the above definition stored in t_def, for tensor v holding \(\sum_j u_{i, j} t_j\),

v.subst(t_def.lhs, t_def.rhs)

or

v.subst_all([t_def])

or, with the act() method,

t_def.act(v)

we can get

\[\sum_{j, k} u_{i, j} o_{j, k} f_k\]

Tensor definition can also be directly subscripts, like t_def[1] gives \(sum_i o_{1, i} f_i\).

By being Tensor subclass, all tensor manipulations are supported. Just the result will not be an automatically a definition.

__init__(base, exts, tensor: drudge.drudge.Tensor)[source]

Initialize the tensor definition.

In the same way as the initializer for the Tensor class, this initializer is also unlikely to be used directly in user code. Drudge methods Drudge.define() and Drudge.define_einst(), and their wrapper Drudge.def_() can be more convenient. Inside drudge scripts, operators <<= or <= can be used, see:ref:drs intro and Drudge.exec_drs().

Parameters:
  • base – The base for the definition. It will be normalized to the correct type depending on the external indices and the right-hand side.
  • exts – The iterable for external indices. They can be either symbol/range pairs for external indices with explicit range, or they can also be a plain symbol for generic definitions.
  • tensor – The RHS of the definition.
rhs

Get the right-hand-side of the definition.

The result is the definition itself. Kept here for backward compatibility.

rhs_terms

Gather the terms on the right-hand-side of the definition.

lhs

Get the standard left-hand-side of the definition.

base

The base of the tensor definition.

exts

The external indices.

simplify()[source]

Simplify the tensor in the definition.

reset_dumms(excl=None)[source]

Reset the dummies in the definition.

The external indices will take higher precedence over the summed indices inside the right-hand side.

__eq__(other)[source]

Compare two tensor definitions for equality.

Note that similar to the equality comparison of tensors, here we only compare the syntactic equality rather than the mathematical equality. The left-hand side is put into consideration only for comparison with another tensor definition.

__str__()[source]

Form simple readable string for a definition.

latex(**kwargs)[source]

Get the latex form for the tensor definition.

The result will just be the form from Tensor.latex() with the RHS prepended.

Parameters:kwargs – All keyword parameters are forwarded to the Drudge.format_latex() method.
display(if_return=True, **kwargs)[source]

Display the tensor definition in interactive notebook sessions.

The parameters here all have the same meaning as in Tensor.display().

act(tensor, wilds=None, full_balance=False)[source]

Act the definition on a tensor.

This method is the active voice version of the Tensor.subst() function. All appearances of the defined object in the tensor will be substituted.

__getitem__(item)[source]

Get the tensor when the definition is indexed.

__getstate__()[source]

Get the current state of the definition.

__setstate__(state)[source]

Set the state for the new definition.

Miscellaneous utilities

Mathematical manipulations

drudge.sum_(obj)[source]

Sum the values in the given iterable.

Different from the built-in summation function, the summation is based on the first item in the iterable. Or a SymPy integer zero is created when the iterator is empty.

drudge.prod_(obj)[source]

Product the values in the given iterable.

Similar to the summation utility function sum_(), here the initial value for the reduction is the first element. Different from the summation, here a SymPy integer unity will be returned for empty iterator.

Timing utilities

class drudge.Stopwatch(print_cb=<built-in function print>)[source]

Utility class for printing timing information.

This class helps to timing the progression of batch jobs. It is capable of getting and formatting the elapsed wall time between consecutive steps. Note that the timing here might not be accurate to one second.

__init__(print_cb=<built-in function print>)[source]

Initialize the stopwatch.

Parameters:print_cb – The function will be called with the formatted time-stamp. By default, it will just be written to stdout.
tick(total=False)[source]

Reset the timer.

Parameters:total – If the total beginning time is going to be reset as well.
tock(label, tensor=None)[source]

Make a timestamp.

The formatted timestamp will be given to the callback of the current stamper. The wall time elapsed since the last tick() will be printed.

Parameters:
  • label – The label for the current step.
  • tensor – When a tensor is given, it will be cached, counted its number of terms. This method has this parameter since if no reduction is performed on the tensor, it might remain unevaluated inside Spark and give misleading timing information.
tock_total()[source]

Make a timestamp for the total time.

The total time will be the time elapsed since the total time was last reset.

__weakref__

list of weak references to the object (if defined)

Output generation

class drudge.Report(filename: str, title)[source]

Simple report for output drudge results.

This class helps to write symbolic results to files for batch processing jobs. It is not recommended to be used directly. Users should use the method provided by drudge class instead in with statements.

__init__(filename: str, title)[source]

Initialize the report object.

add(title=None, content=None, description=None, env='[', **kwargs)[source]

Add a section to the result.

Parameters:
  • title – The title of the equation. It will be used as a section header. When it is given as a None, the section header will not be added.
  • content – The actual tensor or tensor definition to be printed. It can be given as a None to skip any equation rendering.
  • description – A verbal description of the content. It will be typeset before the actual equation as normal text. A None value will cause it to be suppressed.
  • env – The environment to put the equation in. A value of '[' will use \[ and \] as the deliminator of the math environment. Other values will be put inside the common \begin{} and \end{} tags of LaTeX.
  • kwargs – All the rest of the keyword arguments are forwarded to the Drudge.format_latex() method.

Note

For long equations in LaTeX environments, normally env='align' and sep_lines=True can be set to allow each term to occupy separate lines, automatic page break will be inserted, or env='dmath' and sep_lines=False can be used to use breqn package to automatically flow the terms.

write()[source]

Write the report.

Note that this method also closes the output file.

__weakref__

list of weak references to the object (if defined)

class drudge.ScalarLatexPrinter(settings=None)[source]

Specialized LaTeX printers for usage in drudge.

Basically this class tries to fix some problems with using the original LaTeX printer from SymPy in common drudge tasks.

Specifically, for indexed objects, if the base already contains a subscript, it will be raised into a superscript wrapped inside a pair of parenthesis.