<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Sheaves</title>
    <description>Math, Language, Programming</description>
    <link>http://sheaves.github.io</link>
    <atom:link href="http://sheaves.github.io/feed.xml" rel="self" type="application/rss+xml" />
    
      
      <item>
        <title>Tensor associated to a database</title>
        <description>&lt;p&gt;In this post (the 2nd in &lt;a href=&quot;https://sheaves.github.io/topics/#Tensors+and+Language+Models&quot; target=&quot;_blank&quot;&gt;this series on Tensors and Language Models&lt;/a&gt;), we will be interested in collections of tables or dataframes that look something like:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/database_example.png&quot; alt=&quot;Example of a database&quot; title=&quot;Example of a database&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We will call these &lt;em&gt;databases&lt;/em&gt;. 
This example database comes from the paper on &lt;a href=&quot;https://arxiv.org/abs/2502.05076&quot; target=&quot;_blank&quot;&gt;quantifying the knowledge capacity of attention layers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By the end of this post, we will see how to define a &lt;em&gt;tensor&lt;/em&gt; associated to a database.
But first, we will begin by defining a &lt;em&gt;matrix&lt;/em&gt; associated to a function $f: X \to Y$.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;matrix-representation-of-a-function&quot;&gt;Matrix representation of a function&lt;/h3&gt;
&lt;p&gt;Suppose we have a set $X = {a,b,c}$, another set $Y = {m,s}$ and a function $f \colon X \to Y$ given by&lt;/p&gt;

\[\begin{align}
  f \colon X &amp;amp;\to Y \\
  a &amp;amp;\mapsto s \\
  b &amp;amp;\mapsto s \\
  c &amp;amp; \mapsto m
\end{align}\]

&lt;p&gt;Functions are columns in databases. For example, $f$ above is the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;born_in&lt;/code&gt; if we replace $a$ with Astrid,  $b$ with Bernard, and so on. For such a function, we may define an $|X| \times |Y|$ matrix, $M_f$, with entries given by:&lt;/p&gt;

\[(M_f)_{xy} =
		\begin{cases}
			1 \mbox{ if } f(x) = y,\\
			0 \mbox{ otherwise}.
		\end{cases}\]

&lt;p&gt;For the function $f$ above, we would have&lt;/p&gt;

\[M_f = 
		\begin{array}{c@{}c}
			&amp;amp;
			\begin{array}{cc}
				m &amp;amp; s
			\end{array}
		\\
			\begin{array}{c}
				a \\ b \\ c
			\end{array}
			&amp;amp;
			\left(
				\begin{array}{cc}
					0 &amp;amp; 1 
					\\ 
					0 &amp;amp; 1 
          \\
          1 &amp;amp; 0
				\end{array} 
			\right)
		\end{array}\]

&lt;p&gt;The matrix $M_f$ &lt;em&gt;represents&lt;/em&gt; the function $f$ in the following sense: let elements of $X$ be represented as vectors $e_a = (1,0,0),\, e_b = (0,1,0),\, e_c=(0,0,1)$ in $\mathbb{R}^{|X|}$, and elements of $Y$ as vectors $e_m = (1,0),\, e_s = (0,1)$ in $\mathbb{R}^{|Y|}$ (i.e. one-hot encoding), then we have&lt;/p&gt;

\[e_x \, M_f = e_{f(x)}.\]

&lt;p&gt;For example, one can verify that $e_a\, M_f = e_s = e_{f(a)}$. 
Thus $M_f$ stores the values of $f$, and can be used to compute $f$.&lt;/p&gt;

&lt;h3 id=&quot;the-rank-of-m_f&quot;&gt;The rank of $M_f$&lt;/h3&gt;
&lt;p&gt;Since $M_f$ is a matrix, we can compute its &lt;a href=&quot;https://en.wikipedia.org/wiki/Rank_(linear_algebra)&quot; target=&quot;_blank&quot;&gt;rank&lt;/a&gt;.
There are a few &lt;a href=&quot;https://en.wikipedia.org/wiki/Rank_(linear_algebra)#Alternative_definitions&quot; target=&quot;_blank&quot;&gt;equivalent definitions&lt;/a&gt; for the rank of a matrix, but we will use the &lt;a href=&quot;https://en.wikipedia.org/wiki/Rank_(linear_algebra)#Tensor_rank_%E2%80%93_minimum_number_of_simple_tensors&quot; target=&quot;_blank&quot;&gt;one that generalizes most easily to tensors&lt;/a&gt;, since tensors are where we’re headed next.&lt;/p&gt;

&lt;p&gt;In order to state this definition, we first define a rank-1 matrix to be a non-zero matrix that can be expressed as a matrix product $c \cdot r$ of a column vector $c$ and a row vector $r$. The following example shows a rank-1 matrix and its decomposition into $c$ and $r$:&lt;/p&gt;

\[\left(
  \begin{array}{cc}
    0 &amp;amp; 0 
    \\ 
    0 &amp;amp; 0 
    \\
    1 &amp;amp; 0
  \end{array} 
\right)
= 
\left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) 
\left( \begin{array}{cc} 1 &amp;amp; 0 \end{array} \right)\]

&lt;p&gt;The rank of a matrix $M$ is then the smallest $k$ such that $M$ can be expressed as a sum of $k$ rank-1 matrices.&lt;/p&gt;

&lt;p&gt;From this, we can see that an upper bound for the rank of $M$ is simply the number of non-zero entries it has, since $M$ can be expressed as a sum of matrices that contain only 1 non-zero entry.  For example, the matrix $M_f$ above can be written as the sum:&lt;/p&gt;

\[M_f = 
\left(
  \begin{array}{cc}
    0 &amp;amp; 1 
    \\ 
    0 &amp;amp; 0 
    \\
    0 &amp;amp; 0
  \end{array} 
\right)
+
\left(
  \begin{array}{cc}
    0 &amp;amp; 0 
    \\ 
    0 &amp;amp; 1 
    \\
    0 &amp;amp; 0
  \end{array} 
\right)
+
\left(
  \begin{array}{cc}
    0 &amp;amp; 0 
    \\ 
    0 &amp;amp; 0 
    \\
    1 &amp;amp; 0
  \end{array} 
\right)\]

&lt;p&gt;and thus $\mathrm{rank}(M_f) \leq 3$.&lt;/p&gt;

&lt;p&gt;But in fact, the rank of $M_f$ is 2, because this is also a rank 1 matrix:&lt;/p&gt;

\[\left(
  \begin{array}{cc}
    0 &amp;amp; 1 
    \\ 
    0 &amp;amp; 1 
    \\
    0 &amp;amp; 0
  \end{array} 
\right)
= 
\left( \begin{array}{c} 1 \\ 1 \\ 0 \end{array} \right) 
\left( \begin{array}{cc} 0 &amp;amp; 1 \end{array} \right)\]

&lt;p&gt;and we can decompose $M_f$ as&lt;/p&gt;

\[M_f = 
\left(
  \begin{array}{cc}
    0 &amp;amp; 1 
    \\ 
    0 &amp;amp; 1 
    \\
    0 &amp;amp; 0
  \end{array} 
\right)
+
\left(
  \begin{array}{cc}
    0 &amp;amp; 0 
    \\ 
    0 &amp;amp; 0 
    \\
    1 &amp;amp; 0
  \end{array} 
\right).\]

&lt;p&gt;In fact, for any function $f : X \to Y$ between finite sets, we can always decompose $M_f$ into a sum of $k$ rank-1 matrices, one for each element in the &lt;a href=&quot;https://en.wikipedia.org/wiki/Range_of_a_function&quot; target=&quot;_blank&quot;&gt;range&lt;/a&gt; of $f$, so that we have:&lt;/p&gt;

\[\mathrm{rank}(M_f) = |\mathrm{range}(f)|.\]

&lt;p&gt;We thus see that the rank of $M_f$ gives us some notion of the size of $f$.&lt;/p&gt;

&lt;h3 id=&quot;why-matrices-why-rank&quot;&gt;Why matrices? Why rank?&lt;/h3&gt;
&lt;p&gt;But why go through all that trouble? Why not just represent $f$ as a dictionary or hashmap, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f = {&apos;a&apos;: &apos;s&apos;, &apos;b&apos;:&apos;s&apos;, &apos;c&apos;:&apos;m&apos;}&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;The reason is that linear algebra is the language of neural networks and &lt;a href=&quot;https://en.wikipedia.org/wiki/Multilayer_perceptron&quot; target=&quot;_blank&quot;&gt;multi-layer perceptrons (MLPs)&lt;/a&gt;. Indeed, a rank-1 matrix $M$ can be represented as an MLP with 1 hidden neuron, where the activation functions are all just the identity function:
&lt;img src=&quot;/images/rank_1_matrix.png&quot; alt=&quot;Rank 1 matrix as a neural network&quot; title=&quot;Rank 1 matrix&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If $c \cdot r$ is the decomposition of $M$, the weights on the left are the entries of $c$, while the weights on the right are the entries of $r$.&lt;/p&gt;

&lt;p&gt;A rank-$k$ matrix is then an MLP with $k$ hidden neurons:
&lt;img src=&quot;/images/rank_k_matrix.png&quot; alt=&quot;Rank k matrix as a neural network&quot; title=&quot;Rank k matrix&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So the rank of $M_f$ tells us how many hidden neurons are needed to represent the function $f$ when using a linear network (i.e. a network with identity activation functions).&lt;/p&gt;

&lt;p&gt;Of course, “real” MLPs usually have non-linear activation functions (such as ReLu or various sigmoids). As &lt;a href=&quot;https://arxiv.org/abs/2012.14913&quot; target=&quot;_blank&quot;&gt;this paper&lt;/a&gt; shows, such networks also store functions (or “key-value memories”). If we record which outputs we get when we pass inputs of $f$ into the network, we should expect to get a matrix that is similar to $M_f$.
In this setting, the number of hidden neurons continues to be a useful measure of the size or capacity of a network, even though it may no longer be the same as the rank of $M_f$ due to non-linearity.&lt;/p&gt;

&lt;h3 id=&quot;from-functions-to-databases&quot;&gt;From functions to databases&lt;/h3&gt;
&lt;p&gt;So far, we have seen that functions (or key-value pairs, associative memories, dictionaries, hashmaps etc.) can be stored in matrices/MLPs. But functions or key-value pairs are not yet facts. Here’s what I mean: consider the key-value pair &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(Michael Jordan, Basketball)&lt;/code&gt;, which is sometimes given as an example of a “fact”. Indeed, Michael Jordan is most strongly associated to basketball, and if you asked someone to imagine “Michael Jordan”, chances are they would imagine him playing basketball or in a basketball jersey.&lt;/p&gt;

&lt;p&gt;But what is happening here is that we are implicitly filling in a relation between Michael Jordan and basketball, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;played&lt;/code&gt; (for most of his career), or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is most often associated with&lt;/code&gt;. And the facticity of the pair depends on this relation. If we change the relation to something like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;played in 1994&lt;/code&gt;, then the correct fact should mention &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;baseball&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;basketball&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To put it another way, a function or a set of key-value pairs is only &lt;em&gt;one&lt;/em&gt; column in a database, and a fact consists not only of a key-value pair, but also the name of the column, which indicates the relationship between the key and the value. Going back to the example database at the top of this post, the pair &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(Astrid, Singapore)&lt;/code&gt; is an incomplete fact. The full fact is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(Astrid, born_in, Singapore)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Since a database is a collection of functions (one for each column in the database), and each function can be represented as a matrix, it stands to reason that a database can be represented as a collection of matrices, or a &lt;em&gt;tensor&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;tensor-representation-of-a-database&quot;&gt;Tensor representation of a database&lt;/h3&gt;
&lt;p&gt;A &lt;a href=&quot;https://en.wikipedia.org/wiki/Tensor&quot; target=&quot;_blank&quot;&gt;tensor&lt;/a&gt; is a higher dimensional analogue of a matrix. Just as a matrix (or 2-tensor) can be thought of as a collection of vectors (or 1-tensors), a 3-tensor is a collection of matrices.&lt;/p&gt;

&lt;p&gt;We’ve already seen what the matrix for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;born_in&lt;/code&gt; column looks like. Here’s the matrix for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lives_in&lt;/code&gt; column:&lt;/p&gt;

\[\begin{array}{c@{}c}
&amp;amp;
\begin{array}{cc}
m &amp;amp; s
\end{array}
\\
\begin{array}{c}
a \\ b \\ c
\end{array}
&amp;amp;
\left(
\begin{array}{cc}
1 &amp;amp; 0 
\\ 
0 &amp;amp; 1 
\\
1 &amp;amp; 0
\end{array} 
\right)
\end{array}\]

&lt;p&gt;We get a 3-tensor by stacking these two matrices (think of them as lying on different pages of a 2-page book).&lt;/p&gt;

&lt;p&gt;Meanwhile, the matrix representation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currency&lt;/code&gt; column looks like this:&lt;/p&gt;

\[\begin{array}{c@{}c}
&amp;amp;
\begin{array}{cc}
r &amp;amp; d
\end{array}
\\
\begin{array}{c}
m \\ s
\end{array}
&amp;amp;
\left(
\begin{array}{cc}
1 &amp;amp; 0
\\ 
0 &amp;amp; 1
\end{array} 
\right)
\end{array}\]

&lt;p&gt;To stack this with the other two matrices, we need to pad them with zeros so that their shapes agree:&lt;/p&gt;

\[M_\beta = 
\begin{array}{cc}
    &amp;amp;
    \begin{array}{cccc}
    m &amp;amp; s &amp;amp; r &amp;amp; d 
    \end{array}
    \\
    \begin{array}{c}
    a \\ b \\ c \\ m \\ s
    \end{array}
    &amp;amp;
    \left(
    \begin{array}{cccc}
    0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 
    \\
    1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \end{array}
    \right)
\end{array}\]

\[M_\lambda = 
\begin{array}{cc}
    &amp;amp;
    \begin{array}{cccc}
    m &amp;amp; s &amp;amp; r &amp;amp; d 
    \end{array}
    \\
    \begin{array}{c}
    a \\ b \\ c \\ m \\ s
    \end{array}
    &amp;amp;
    \left(
    \begin{array}{cccc}
    1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 
    \\
    1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \end{array}
    \right)
\end{array}\]

\[M_\chi = 
\begin{array}{cc}
    &amp;amp;
    \begin{array}{cccc}
    m &amp;amp; s &amp;amp; r &amp;amp; d 
    \end{array}
    \\
    \begin{array}{c}
    a \\ b \\ c \\ m \\ s
    \end{array}
    &amp;amp;
    \left(
    \begin{array}{cccc}
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 
    \\
    0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 
    \end{array}
    \right)
\end{array}\]

&lt;p&gt;Here $\beta$ is the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;born_in&lt;/code&gt;, $\lambda$ is the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lives_in&lt;/code&gt; and $\chi$ is the column &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;currency&lt;/code&gt;. 
Stacking these 3 matrices together gives us the $3 \times 5 \times 4$ tensor associated to this database.&lt;/p&gt;

&lt;p&gt;For the general definition, it helps to think of a database as a collection of &lt;a href=&quot;https://en.wikipedia.org/wiki/Semantic_triple&quot; target=&quot;_blank&quot;&gt;RDF triples&lt;/a&gt;. Although RDF triples are usually in (subject, predicate, object) or (key, query, value) format, for the purpose of this post, it will help to think of them in (predicate, subject, object) or (query, key, value) format instead. For example, our database above could be represented as the following set of triples:&lt;/p&gt;

\[\begin{array}{ccc}
	born\_in &amp;amp; Astrid &amp;amp; Singapore \\ 
	born\_in &amp;amp; Bernard &amp;amp; Singapore \\ 
	born\_in &amp;amp; Colin &amp;amp; Malaysia \\ 
	lives\_in &amp;amp; Astrid &amp;amp; Malaysia \\ 
	lives\_in &amp;amp; Bernard &amp;amp; Singapore \\ 
	lives\_in &amp;amp; Colin &amp;amp; Malaysia \\
	currency &amp;amp; Malaysia &amp;amp; Ringgit \\
	currency &amp;amp; Singapore &amp;amp; Dollar
 \end{array}\]

&lt;p&gt;If we let $\mathcal{D}$ be this list of triples, we may define a 3-tensor $D$ of the appropriate size whose entries are:&lt;/p&gt;

\[D_{qkv} = \begin{cases}
1 \mbox{ if } (q,k,v) \in \mathcal{D}, \mbox{ and}\\
0 \mbox{ otherwise.}
\end{cases}\]

&lt;p&gt;Just as the matrix $M_f$ contains all the information of the function $f$, the tensor $D$ contains all the information of the database $\mathcal{D}$.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Technical note: In the paper &lt;a href=&quot;https://arxiv.org/abs/2502.05076&quot; target=&quot;_blank&quot;&gt;‘Paying attention to facts’&lt;/a&gt;, the triples are in (key,query,value) format and the entries of $D$ are indexed by $k,q,v$ instead of $q,k,v$.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;summary-and-whats-next&quot;&gt;Summary and what’s next&lt;/h3&gt;
&lt;p&gt;In this post, we have seen how to represent a function as a matrix, and a database as a 3-tensor. We have also touched upon the relationship between functions and MLPs. Note that a database &lt;em&gt;cannot&lt;/em&gt; be represented as a single MLP, because MLPs only store key-value pairs! But as we will eventually see, we can encode databases in the attention heads of transformers. (see the &lt;a href=&quot;https://arxiv.org/abs/2502.05076&quot; target=&quot;_blank&quot;&gt;‘Paying attention to facts’ paper&lt;/a&gt; to skip ahead)&lt;/p&gt;

&lt;p&gt;While we have defined and computed the rank of $M_f$, we haven’t talked about the rank of tensors. That will be the subject of the next post. We will also see that tensor rank has some unintuitive properties (at least for those who are more familiar with matrix rank).&lt;/p&gt;

&lt;h4 id=&quot;getting-involved&quot;&gt;Getting involved&lt;/h4&gt;
&lt;p&gt;If this line of work interests you, please reach out in the comments below, or email me at &lt;a href=&quot;mailto:liangze.wong@gmail.com&quot;&gt;liangze.wong@gmail.com&lt;/a&gt;.
I would be happy to collaborate on topics related to interpretability and theory of language models, even if it does not involve tensors or factual recall.&lt;/p&gt;

&lt;p&gt;If you are hiring (or know of anyone who might be interested in hiring) research scientists with my background, I am currently open to work, and am open to re-location!&lt;/p&gt;
</description>
        <pubDate>Sun, 16 Mar 2025 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Database-Tensors/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Database-Tensors/</guid>
      </item>
      
    
      
      <item>
        <title>Tensors and Factual Recall in Language Models</title>
        <description>&lt;p&gt;This is the first in a &lt;a href=&quot;https://sheaves.github.io/topics/#Tensors+and+Language+Models&quot; target=&quot;_blank&quot;&gt;series of posts&lt;/a&gt; explaining and expanding on the ideas introduced in the following papers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Paying attention to facts: quantifying the knowledge capacity of attention layers&lt;br /&gt;
[&lt;a href=&quot;https://arxiv.org/abs/2502.05076&quot; target=&quot;_blank&quot;&gt;arXiv:2502.05076&lt;/a&gt;]&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;‘Generalization is hallucination’ through the lens of tensor completions&lt;br /&gt;
[&lt;a href=&quot;https://arxiv.org/abs/2502.17305&quot; target=&quot;_blank&quot;&gt;arXiv:2502.17305&lt;/a&gt;]&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;!--more--&gt;

&lt;p&gt;The overarching theme behind these papers is that tensors and tensor completions are a useful theoretical framework for thinking about language models, specifically generative models based on &lt;a href=&quot;https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)&quot; target=&quot;_blank&quot;&gt;transformers&lt;/a&gt;.
As I’ve only started doing this in 2025, everything is still quite preliminary. In particular, the results of these papers only hold for very simple toy models and synthetic datasets.&lt;/p&gt;

&lt;p&gt;Nevertheless, this line of inquiry has already yielded some interesting findings.
For example, the &lt;a href=&quot;https://arxiv.org/abs/2502.05076&quot; target=&quot;_blank&quot;&gt;‘Paying attention to the facts’ paper&lt;/a&gt; shows that transformers can store facts in their attention heads and not just their MLP layers. It also shows that we can increase model capacity without increasing the number of parameters by have larger output-value weights and smaller key-query weights. 
Meanwhile, the &lt;a href=&quot;https://arxiv.org/abs/2502.17305&quot; target=&quot;_blank&quot;&gt;‘Generalization is hallucination’ paper&lt;/a&gt; identifies a common mechanism that shows how training data gives rise to certain types of generalizations and hallucinations.&lt;/p&gt;

&lt;p&gt;In this series of blog posts, I plan to explain and expand on some of the ideas in these papers, in the hopes of encouraging more research in this direction.
In the next few posts, I will introduce tensors and their rank, relate them to language datasets and language models, talk about databases and ways of generating `random databases’, and highlight the non-linear effects of argmax and softmax on rank.&lt;/p&gt;

&lt;p&gt;But this post will be less technical. These papers may broadly be described as using &lt;em&gt;tensors&lt;/em&gt; to understand &lt;em&gt;factual recall&lt;/em&gt; in language models, and so in this post I hope to explain some of the motivations behind these papers by answering two questions, “Why facts?” and “Why tensors?”&lt;/p&gt;

&lt;h3 id=&quot;why-facts&quot;&gt;Why facts?&lt;/h3&gt;
&lt;p&gt;Factual recall in language models has some nice properties as a research topic. 
Firstly, facts are simple. 
At their core, facts can be represented as subject-predicate-object (or subject-relation-object) triples, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(France, capital_is, Paris)&lt;/code&gt;, although they may be represented in other ways in natural language.&lt;/p&gt;

&lt;p&gt;Not only are facts easy to represent, it is usually the case that there is one right answer: given the pair &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(France, capital_is)&lt;/code&gt;, you would expect a model to output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Paris&lt;/code&gt;. 
This turns factual recall into a supervised task, and removes the complication of having multiple acceptable outputs, which are common in many other next-token prediction tasks.
On a related note, databases of facts (be it in traditional tabular databases, or graph databases, or RDF triples) are readily available as supervised datasets, and are also easy to synthetically generate.&lt;/p&gt;

&lt;p&gt;In addition to the nature of facts themselves, it seems to be generally accepted that LLMs can store facts, and that this is a desirable behaviour.
It is thus important to study factual recall, because a better understanding will enable us to better control this behaviour (to increase the storage capacity, or reduce factual hallucinations, for example).&lt;/p&gt;

&lt;p&gt;I think it is in part because of these reasons that the question of how LLMs memorize facts has received a fair amount of attention, as these examples demonstrate:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2402.07321&quot; target=&quot;_blank&quot;&gt;Summing up the facts: Additive mechanisms behind factual recall in LLMs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://aclanthology.org/2024.findings-emnlp.658/&quot; target=&quot;_blank&quot;&gt;Scaling laws for fact memorization of large language models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openreview.net/forum?id=hwSmPOAmhk&quot; target=&quot;_blank&quot;&gt;Understanding factual recall in transformers via associative memories&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2309.14316&quot; target=&quot;_blank&quot;&gt;Physics of language models: Part 3.1,
knowledge storage and extraction&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/2404.05405&quot; target=&quot;_blank&quot;&gt;Part 3.3,
knowledge capacity scaling laws&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://aclanthology.org/2021.emnlp-main.446.pdf&quot; target=&quot;_blank&quot;&gt;Transformer Feed-Forward Layers Are Key-Value Memories&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the simplicity of facts, however, and despite the research that has gone into it, we still do not have a full picture of how facts are stored in language models.
For example, papers about scaling laws of memorization capacity tend to quantify the size of a database of facts in terms of the number of facts contained. 
However, this ignores the structure of facts and patterns among facts that might make one database easier or harder to memorize than another database with the same number of facts.&lt;/p&gt;

&lt;p&gt;My sense is that the number of facts in a database is just a proxy for another measure of database size that better reflects the underlying properties of the database.
While I have not uncovered this measure yet, I think my approach (in terms of tensors) has brought us a step closer to it.&lt;/p&gt;

&lt;p&gt;Which leads us to the next question…&lt;/p&gt;

&lt;h3 id=&quot;why-tensors&quot;&gt;Why tensors?&lt;/h3&gt;
&lt;p&gt;When I first started this line of investigation, I did not have tensors in mind at all.
I was simply trying to find a way to think about facts represented as &lt;a href=&quot;https://en.wikipedia.org/wiki/Semantic_triple&quot; target=&quot;_blank&quot;&gt;RDF triples&lt;/a&gt;.
I wanted a setting that allowed me to talk about an RDF triple being “in the span” of other triples in a precise sense.
Since each RDF triple has 3 parts (subject, predicate and object), it seemed natural to use a 3-tensor rather than a matrix to encode this information.
From there, it was a natural extension to $n$-tensors for more general $n$-grams.&lt;/p&gt;

&lt;p&gt;I do not think that tensors are &lt;em&gt;the&lt;/em&gt; way to understand language models, merely &lt;em&gt;one&lt;/em&gt; way.
Each framework (be it embeddings, or neurons, or dictionary learning, or …) sheds light on language models from a different angle, and all I hope to do is to introduce another angle to the story.
I think a lot can be gained by also trying to understand the interplay between different frameworks.
For example, can we relate linear-algebraic properties of a tensor to geometric properties of embeddings that result from decomposing the tensor?
Can we relate linear-algebraic properties of tensors to neuron activations?&lt;/p&gt;

&lt;h3 id=&quot;up-next&quot;&gt;Up next&lt;/h3&gt;
&lt;p&gt;I’ve talked a lot about tensors without really defining them. In the &lt;a href=&quot;https://sheaves.github.io/Database-Tensors/&quot;&gt;next post&lt;/a&gt;, I will define tensors and relate them to facts represented as RDF triples.&lt;/p&gt;

&lt;h4 id=&quot;getting-involved&quot;&gt;Getting involved&lt;/h4&gt;
&lt;p&gt;If this line of work interests you, please reach out in the comments below, or email me at &lt;a href=&quot;mailto:liangze.wong@gmail.com&quot;&gt;liangze.wong@gmail.com&lt;/a&gt;.
I would be happy to collaborate on topics related to interpretability and theory of language models, even if it does not involve tensors or factual recall.&lt;/p&gt;

&lt;p&gt;If you are hiring (or know of anyone who might be interested in hiring) research scientists with my background, I am currently open to work, and am open to re-location!&lt;/p&gt;
</description>
        <pubDate>Thu, 13 Mar 2025 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Tensors-Facts-LLMs/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Tensors-Facts-LLMs/</guid>
      </item>
      
    
      
    
      
      <item>
        <title>Distributive Laws</title>
        <description>&lt;p&gt;I’ve been participating in the &lt;a href=&quot;http://www.math.jhu.edu/~eriehl/kanII/&quot;&gt;Kan Extension Seminar II&lt;/a&gt;, and this week it’s my turn to &lt;a href=&quot;https://golem.ph.utexas.edu/category/2017/02/distributive_laws.html&quot;&gt;post about Jon Beck’s “Distributive Laws”&lt;/a&gt; at the &lt;a href=&quot;https://golem.ph.utexas.edu/category/&quot;&gt;n-Category Cafe&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;The post uses lots of string diagrams for monads, resulting in pictures like the following:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/distributive/comp_assoc.png&quot; alt=&quot;&quot; title=&quot;Associativity of the composite monad&quot; /&gt;&lt;/p&gt;

&lt;p&gt;See you &lt;a href=&quot;https://golem.ph.utexas.edu/category/2017/02/distributive_laws.html&quot;&gt;there&lt;/a&gt;!&lt;/p&gt;
</description>
        <pubDate>Sat, 18 Feb 2017 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Distributive-Laws/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Distributive-Laws/</guid>
      </item>
      
    
      
      <item>
        <title>Noncommutative Algebras in Sage</title>
        <description>&lt;p&gt;In this post, I’ll demonstrate 3 ways to define non-commutative rings in Sage. They’re essentially different ways of expressing the non-commutative relations in the ring:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://doc.sagemath.org/html/en/reference/algebras/sage/algebras/free_algebra.html#sage.algebras.free_algebra.FreeAlgebra_generic.g_algebra&quot; target=&quot;_blank&quot;&gt;Via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g_algebra&lt;/code&gt;&lt;/a&gt;: define the relations directly&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.sagemath.org/documentation/html/en/reference/polynomial_rings/sage/rings/polynomial/plural.html&quot; target=&quot;_blank&quot;&gt;Via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NCPolynomialRing_plural&lt;/code&gt;&lt;/a&gt;: define a pair of structural matrices&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://doc.sagemath.org/html/en/reference/rings/sage/rings/quotient_ring.html&quot; target=&quot;_blank&quot;&gt;Via a quotient of a letterplace ring&lt;/a&gt;: define the ideal generated by the relations (only works for homogeneous relations)&lt;/li&gt;
&lt;/ol&gt;

&lt;!--more--&gt;

&lt;p&gt;As far as I know, all 3 methods rely on Sage’s interface with &lt;a href=&quot;https://www.singular.uni-kl.de/index.php&quot; target=&quot;_blank&quot;&gt;Singular&lt;/a&gt; and its non-commutative extension &lt;a href=&quot;https://www.singular.uni-kl.de/Manual/4-0-2/sing_469.htm&quot; target=&quot;_blank&quot;&gt;Plural&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In addition to all the documentation linked above, I also relied heavily on  Greuel and Pfister’s &lt;a href=&quot;http://www.cimpa-icpam.org/archivesecoles/20130130100834/singularbuch1-210.pdf&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;A Singular Introduction to
Commutative Algebra&lt;/em&gt;&lt;/a&gt;. Despite the title, it does have a pretty substantial section (1.9) devoted to non-commutative $G$-algebras.&lt;/p&gt;

&lt;h2 id=&quot;umathfraksl_2-and-its-homogenization&quot;&gt;$U(\mathfrak{sl}_2)$ and its homogenization&lt;/h2&gt;

&lt;p&gt;The running example throughout this post will be the universal enveloping algebra $U(\mathfrak{sl}_2)$ over $\mathbb{Q}$.&lt;/p&gt;

&lt;p&gt;We’ll define this to be the (non-commutative) $\mathbb{Q}$-algebra $U$ with generators $e,f,h$ subject to the relations&lt;/p&gt;

\[[e,f] = h, \qquad [h,e] = 2e, \qquad [h,f] = -2f.\]

&lt;p&gt;If we set $e,f,h$ to have degree 1, these relations are not homogeneous. Their left-hand sides only have degree 2 terms, while their right-hand sides have degree 1 terms as well. This is fine with the first two methods, but won’t work for method 3 (which requires homogeneous relations).&lt;/p&gt;

&lt;p&gt;To demonstrate the third method, we’ll define the $\mathbb{Q}$-algebra $H$ with generators $e,f,h,t$ subject to the homogeneous relations&lt;/p&gt;

\[[t,e] = [t,f] = [t,h] = 0,\]

\[[e,f] = ht, \qquad [h,e] = 2et, \qquad [h,f] = -2ft.\]

&lt;p&gt;We can obtain $U$ both as a quotient and a localization of $H$:&lt;/p&gt;

\[H/(1-t) \;\;\cong\;\; U \;\;\cong\;\; H[t^{-1}]_0.\]

&lt;h2 id=&quot;g-algebras&quot;&gt;$G$-algebras&lt;/h2&gt;
&lt;p&gt;Using the  &lt;a href=&quot;http://doc.sagemath.org/html/en/reference/algebras/sage/algebras/free_algebra.html#sage.algebras.free_algebra.FreeAlgebra_generic.g_algebra&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g_algebra&lt;/code&gt;&lt;/a&gt; method of Sage’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FreeAlgebra&lt;/code&gt; class, we can simply plug our noncommutative relations in, and get our non-commutative ring. This is about as easy as it gets:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
F.&lt;e,f,h&gt; = FreeAlgebra(QQ,3)
U = F.g_algebra({f*e: e*f - h, h*e: e*h + 2*e, h*f: f*h-2*f})
U
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Let’s unravel what’s going on here.&lt;/p&gt;

&lt;h3 id=&quot;monomial-orderings-and-pbw-basis&quot;&gt;Monomial orderings and PBW basis&lt;/h3&gt;
&lt;p&gt;Most algorithms for commutative and non-commutative rings require an ordering on the generators. In our case, let’s use the ordering&lt;/p&gt;

\[e \leq f \leq h.\]

&lt;p&gt;This is implicitly stated in our code: we wrote &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;F.&amp;lt;e,f,h&amp;gt;&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;F.&amp;lt;h,e,f&amp;gt;&lt;/code&gt;, for example.&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;standard word&lt;/em&gt; is a monomial of the form&lt;/p&gt;

\[e^if^jh^k, \qquad i,j,k \in \mathbb{N}\]

&lt;p&gt;In the polynomial ring $\mathbb{Q}[e,f,h]$, every monomial can be expressed in this form, so the set of standard words forms a $\mathbb{Q}$-basis for $\mathbb{Q}[e,f,h]$.&lt;/p&gt;

&lt;p&gt;In a non-commutative ring, whether or not the standard words form a basis depends on what relations we have. Such a basis, if it exists, is called a &lt;a href=&quot;https://en.wikipedia.org/wiki/Poincar%C3%A9%E2%80%93Birkhoff%E2%80%93Witt_theorem&quot; target=&quot;_blank&quot;&gt;PBW basis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The free algebra $F = \mathbb{Q}\langle e,f,h\rangle$ has no relations, so does not have a PBW basis. Fortunately, our algebra $U$ does have a PBW basis.&lt;/p&gt;

&lt;p&gt;This means that we can always express a non-standard monomial (e.g. $fe$) as a sum of standard monomials (e.g. $ef - h$). The non-commutative relations that define $U$ can thus be thought of as an algorithm for turning non-standard words into sums of standard words.&lt;/p&gt;

&lt;p&gt;To do this in Sage, we define a &lt;a href=&quot;https://docs.python.org/2/tutorial/datastructures.html#dictionaries&quot; target=&quot;_blank&quot;&gt;dictionary&lt;/a&gt; whose keys are non-standard words and values are the standard words they become.&lt;/p&gt;

&lt;p&gt;In the above example, our dictionary was short enough to fit into one line, but we could also define a dictionary separately and pass it into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g_algebra&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
F.&lt;e,f,h&gt; = FreeAlgebra(QQ,3)
 
U_relations = {
   f*e : e*f - h,
   h*e : e*h + 2*e,
   h*f : f*h - 2*f
}
 
U = F.g_algebra(U_relations)
U
   &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;It’s very important that the keys are non-standard words and the values are sums of standard words. Mathematically, the relation $fe = ef - h$ is the same as $ef = fe + h$, but if we replace &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f*e : e*f - h&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e*f : f*e + h&lt;/code&gt; in the code, we’ll get an error (try it!).&lt;/p&gt;

&lt;h3 id=&quot;what-are-g-algebras&quot;&gt;What are $G$-algebras?&lt;/h3&gt;
&lt;p&gt;The reason why $U$ has a PBW basis is because it is a $G$-algebra. Briefly, $G$-algebras are algebras whose relations satisfy certain non-degeneracy conditions that make the algebra nice to work with.&lt;/p&gt;

&lt;p&gt;For a full definition of $G$-algebras, refer to &lt;a href=&quot;http://www.cimpa-icpam.org/archivesecoles/20130130100834/singularbuch1-210.pdf&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;A Singular Introduction to Commutative Algebra&lt;/em&gt;&lt;/a&gt; or the &lt;a href=&quot;https://www.singular.uni-kl.de/Manual/4-0-2/sing_534.htm#SEC573&quot; target=&quot;_blank&quot;&gt;Plural manual&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If $A$ is a $G$-algebra, then it has a PBW basis, is left and right Noetherian, and is an integral domain. More importantly (for this site at least!), it means that we can define $A$ in Singular/Plural, and hence in Sage.&lt;/p&gt;

&lt;h2 id=&quot;structural-matrices-for-a-g-algebra&quot;&gt;Structural matrices for a $G$-algebra&lt;/h2&gt;
&lt;p&gt;Another way of writing our non-commutative relations is&lt;/p&gt;

\[\begin{pmatrix}
0 &amp;amp; fe &amp;amp; he \\
0 &amp;amp; 0  &amp;amp; hf \\
0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix}
=
\begin{pmatrix}
0 &amp;amp; 1 &amp;amp; 1 \\
0 &amp;amp; 0 &amp;amp; 1 \\
0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix}
*
\begin{pmatrix}
0 &amp;amp; ef &amp;amp; eh \\
0 &amp;amp; 0  &amp;amp; fh \\
0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix}
+
\begin{pmatrix}
0 &amp;amp; -h &amp;amp; 2e \\
0 &amp;amp; 0  &amp;amp; -2f \\
0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix},\]

&lt;p&gt;where $ * $ denotes element-wise multiplication (so there isn’t any linear algebra going on here; we’re just using matrices to organize the information). Let $N,C,S,D$ be the matrices above, in that order, so that $N = C*S + D$.&lt;/p&gt;

&lt;p&gt;If we let $x_1 = e, x_2 = f, x_3 = h$ (so that $x_i \leq x_j$ if $i \leq j$) then for $i &amp;lt; j$&lt;/p&gt;

\[n_{ij} = x_j x_i, \qquad s_{ij} = x_i x_j.\]

&lt;p&gt;In other words, $N$ contains the non-standard words that we’re trying to express in terms of the standard words in $S$.&lt;/p&gt;

&lt;p&gt;The matrices $C$ and $D$ are called the &lt;em&gt;structural matrices&lt;/em&gt; of the $G$-algebra, and their entries are such that our relations may be written&lt;/p&gt;

\[x_jx_i = c_{ij} x_i x_j  + d_{ij}, \qquad i &amp;lt; j\]

&lt;p&gt;with zeros everywhere else ($i \geq j$). If $C = D = 0$, the resulting algebra will be commutative.&lt;/p&gt;

&lt;p&gt;We can use the structural matrices $C$ and $D$ to define our algebra via Sage’s  &lt;a href=&quot;http://www.sagemath.org/documentation/html/en/reference/polynomial_rings/sage/rings/polynomial/plural.html&quot; target=&quot;_blank&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NCPolynomialRing_plural&lt;/code&gt;&lt;/a&gt; function (note that Python uses zero-indexing for matrices):&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
from sage.rings.polynomial.plural import NCPolynomialRing_plural

R = QQ[&apos;e&apos;,&apos;f&apos;,&apos;h&apos;]
R.inject_variables()

C = matrix(R,3)
D = matrix(R,3)

C[0,1] = 1
C[0,2] = 1
C[1,2] = 1

D[0,1] = -h
D[0,2] = 2*e
D[1,2] = -2*f

show(C)
show(D)

U.&lt;e,f,h&gt; = NCPolynomialRing_plural(QQ, c = C, d = D, order = TermOrder(&apos;lex&apos;,3), category = Algebras(QQ))
U
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;R&lt;/code&gt; is a commutative polynomial ring. In fact, up till the point where we call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NCPolynomialRing_plural&lt;/code&gt;, even the variables &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e,f,h&lt;/code&gt; are treated as commutative variables.&lt;/p&gt;

&lt;p&gt;This method of defining $U$ is considerably longer and more prone to mistakes than using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g_algebra&lt;/code&gt;. As stated in the &lt;a href=&quot;http://www.sagemath.org/documentation/html/en/reference/polynomial_rings/sage/rings/polynomial/plural.html&quot; target=&quot;_blank&quot;&gt;documentation&lt;/a&gt;, this is not intended for use! I’m including it here because this is essentially how one would go about defining a $G$-algebra in Singular. In fact, the Sage method &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g_algebra&lt;/code&gt; calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NCPolynomialRing_plural&lt;/code&gt;, which in turn calls Singular.&lt;/p&gt;

&lt;h2 id=&quot;quotients-of-letterplace-rings&quot;&gt;Quotients of letterplace rings&lt;/h2&gt;
&lt;p&gt;Our final method for defining non-commutative rings makes use of &lt;a href=&quot;http://doc.sagemath.org/html/en/reference/algebras/sage/algebras/letterplace/free_algebra_letterplace.html&quot; target=&quot;_blank&quot;&gt;Sage’s implementation of Singular’s letterplace rings&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As mentioned at the start of this post, this method requires the relations to be homogeneous, so we’ll work with $H$ instead of $U$.&lt;/p&gt;

&lt;p&gt;Let $\mathbb{Q}\langle e,f,h,t \rangle$ be the free algebra on 4 variables. Consider the two-sided ideal $I$ generated by the relations for $H$:&lt;/p&gt;

\[I = (te - et, tf - ft, th - ht, ef - fe - ht, he - eh - 2et, hf - fh + 2ft)\]

&lt;p&gt;Then&lt;/p&gt;

\[H = \mathbb{Q}\langle e,f,h,t \rangle/I.\]

&lt;p&gt;This can be expressed Sage-ly:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
F.&lt;e,f,h,t&gt; = FreeAlgebra(QQ, implementation=&apos;letterplace&apos;)

I = [
    t*e - e*t,
    t*f - f*t,
    t*h - h*t,
    e*f - f*e - h*t,
    h*e - e*h - 2*e*t,
    h*f - f*h + 2*f*t
]

H = F.quotient(F * I * F)
H
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;The expression &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;F*I*F&lt;/code&gt; is the two-sided ideal generated by elements in the list &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;I&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Although $U$ cannot be defined using this method, $H$ can be defined using all three methods. As a (fun?) exercise, try defining $H$ using the other two methods.&lt;/p&gt;

&lt;h2 id=&quot;difficulties&quot;&gt;Difficulties&lt;/h2&gt;
&lt;p&gt;These methods can be used to define many non-commutative algebras such as the Weyl algebra and various enveloping algebras of Lie algebras. One can also define these algebras over fields other than $\mathbb{Q}$, such as $\mathbb{F}_p$ (edit: but unfortunately not $\mathbb{C}$ or $\mathbb{R}$).&lt;/p&gt;

&lt;p&gt;However, we cannot define algebras over $\mathbb{Q}(q)$, the fraction field of $\mathbb{Q}[q]$:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
Qq =QQ[&apos;q&apos;].fraction_field()
Qq.inject_variables()

F.&lt;x,y&gt; = FreeAlgebra(Qq,2)

F.g_algebra({y*x : q*x*y})  
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;This is a problem if we want to define rings with relations such as&lt;/p&gt;

\[yx = qxy.\]

&lt;p&gt;Such relations occur frequently when studying quantum groups, for example.&lt;/p&gt;

&lt;p&gt;This is suprising, because one can easily define $\mathbb{Q}(q)$ and non-commutative $\mathbb{Q}(q)$-algebras in Singular/Plural, which is what Sage is using. It seems that the problem is in Sage’s wrapper for Singular/Plural, because Sage can’t even pass the ring $\mathbb{Q}(q)$ to Singular.&lt;/p&gt;

&lt;p&gt;There’s a &lt;a href=&quot;http://trac.sagemath.org/ticket/14886&quot; target=&quot;_blank&quot;&gt;trac ticket&lt;/a&gt; for this problem, but until it gets resolved, we’ll just have to define such rings directly in Singular/Plural. Thanks to the amazing capabilities of the &lt;a href=&quot;https://sagecell.sagemath.org/&quot; target=&quot;_blank&quot;&gt;Sage Cell Server&lt;/a&gt;, &lt;del&gt;we’ll do this in the next post&lt;/del&gt;!&lt;/p&gt;
</description>
        <pubDate>Thu, 03 Mar 2016 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Noncommutative-Sage/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Noncommutative-Sage/</guid>
      </item>
      
    
      
      <item>
        <title>The Weyl Algebra and $\mathfrak{sl}_2$</title>
        <description>&lt;!--more--&gt;

&lt;p&gt;I’ve been away from this blog for quite a while - almost a year, in fact! My excuses are my wedding and the prelims (a.k.a. quals), as well as all the preparation that had to go into them (although, to be honest, those things only occupied me till September last year!).&lt;/p&gt;

&lt;p&gt;Looking back at my previous posts, I’ve realized that in attempting to teach &lt;em&gt;both&lt;/em&gt; math and code, I probably ended up doing neither. This is really not the best place to learn representation theory (for example) - there are better books and blogs out there. Also, most of the code that I wrote to illustrate those posts feels contrived, and neither highlights Sage’s strengths nor reflects how I normally use Sage for my assignments and projects.&lt;/p&gt;

&lt;p&gt;I’ve thus decided to write shorter posts with code that I actually use (on &lt;a href=&quot;https://cloud.sagemath.com/&quot; target=&quot;_blank&quot;&gt;SageMathCloud&lt;/a&gt;), along with some explanations of the code. Lately, I’ve been writing code for non-commutative algebra and combinatorics, so today I’ll start with a simple example of a non-commutative algebra.&lt;/p&gt;

&lt;h2 id=&quot;the-weyl-algebra&quot;&gt;The Weyl Algebra&lt;/h2&gt;
&lt;p&gt;The $1$-dim. Weyl algebra is the (non-commutative) algebra generated by $x, \partial_x$ subject to the relations&lt;/p&gt;

\[x \partial_x = \partial_x x - 1.\]

&lt;p&gt;If we treat $x$ as “multiplication by $x$” and $\partial_x$ as  “differentiation w.r.t. $x$”, this relation is really just an application of the chain rule:&lt;/p&gt;

\[\partial_x (x (f(x)) = f(x) + x \partial_x f(x)\]

&lt;p&gt;We can generalize to higher dimensions: the $n$-dim. Weyl algebra is the algebra generated by  $x_1,\dots,x_n,\partial_{x_1},\dots,\partial_{x_n}$ quotiented by the relations that arise from treating them as the obvious operators on $\mathbb{F}[x_1,\dots,x_n]$.&lt;/p&gt;

&lt;h3 id=&quot;weyl-algebras-in-sage&quot;&gt;Weyl algebras in Sage&lt;/h3&gt;
&lt;p&gt;It’s easy to &lt;a href=&quot;http://doc.sagemath.org/html/en/reference/algebras/sage/algebras/weyl_algebra.html&quot; target=&quot;_blank&quot;&gt;define the Weyl algebra in Sage&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# 3-dim Weyl algebra over QQ[x,y,z]
R.&lt;x,y,z&gt; = QQ[]
W = DifferentialWeylAlgebra(R)
W.inject_variables()
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inject_variables&lt;/code&gt; allows us to use the operators &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x,y,z,dx,dy,dz&lt;/code&gt; in subsequent code (where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dx&lt;/code&gt; denotes $\partial_x$, etc).&lt;/p&gt;

&lt;p&gt;One can do rather complicated computations:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
dx * dy * dz * (x + y + z)^2
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;By default, Sage chooses to represent monomials with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x,y,z&lt;/code&gt; in front of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dx,dy,dz&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
dx*x
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Keep in mind that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; does not refer to the polynomial $x \in \mathbb{F}[x]$, so one should not expect &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dx*x&lt;/code&gt; to be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;(For some reason &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;show&lt;/code&gt; does not give the right output. Try &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;show(x)&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;show(x*dx)&lt;/code&gt;, for example.)&lt;/p&gt;

&lt;h2 id=&quot;representations-of-mathfraksl_2&quot;&gt;Representations of $\mathfrak{sl}_2$&lt;/h2&gt;
&lt;p&gt;It turns out that the $1$-dim. Weyl algebra gives a representation of $\mathfrak{sl}_2(\mathbb{F})$.&lt;/p&gt;

&lt;p&gt;The Lie algebra $\mathfrak{sl}_2(\mathbb{F})$ is generated by $E,F,H$ subject to the relations&lt;/p&gt;

\[[H,E] = 2E, \qquad [H,F] = -2F, \qquad [E,F] = H.\]

&lt;p&gt;Define the following elements of the $1$-dim. Weyl algebra:&lt;/p&gt;

\[E = x \partial_x^2,\qquad F = -x,\qquad H = -2x\partial_x.\]

&lt;p&gt;We can use Sage to quickly verify that these elements indeed satisfy the relations for $\mathfrak{sl}_2$ (using the commutator as the Lie bracket i.e. $[A,B] = AB - BA$):&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
R.&lt;x&gt; = QQ[]
W = DifferentialWeylAlgebra(R)
W.inject_variables()

E = x*dx^2
F = -x
H = -2*x*dx

print(H*E - E*H == 2*E)
print(H*F - F*H == -2*F)
print(E*F - F*E == H)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Working over $\mathbb{C}$, this action of $\mathfrak{sl}_2(\mathbb{C})$ makes $\mathbb{C}[x]$ a Verma module of highest weight $0$.&lt;/p&gt;

&lt;p&gt;In fact, we can make $\mathbb{C}[x]$ a Verma module of highest weight $c$ for any $c \in \mathbb{C}$ by using:&lt;/p&gt;

\[E = (x \partial_x - c)\partial_x,\qquad F = -x,\qquad H = -2x\partial_x + c.\]

&lt;p&gt;We verify this again in Sage:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
Fc.&lt;c&gt; = CC[] # This allows c to be a complex indeterminate 
R.&lt;x&gt; = Fc[]
W = DifferentialWeylAlgebra(R)
W.inject_variables()

E = (x*dx-c)*dx
F = -x
H = -2*x*dx + c

print(H*E - E*H == 2*E)
print(H*F - F*H == -2*F)
print(E*F - F*E == H)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;In subsequent posts, I’ll talk more about defining other non-commutative algebras in Sage and Singular.&lt;/p&gt;
</description>
        <pubDate>Wed, 17 Feb 2016 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Weyl-Algebra/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Weyl-Algebra/</guid>
      </item>
      
    
      
      <item>
        <title>Character Theory Basics</title>
        <description>&lt;p&gt;This post illustrates some of SageMath’s character theory functionality, as well as some basic results about characters of finite groups.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;basic-definitions-and-properties&quot;&gt;Basic Definitions and Properties&lt;/h2&gt;

&lt;p&gt;Given a representation $(V,\rho)$ of a group $G$, its &lt;a href=&quot;http://en.wikipedia.org/wiki/Character_theory&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;character&lt;/strong&gt;&lt;/a&gt; is a map $ \chi: G \to \mathbb{C}$ that returns the &lt;a href=&quot;http://en.wikipedia.org/wiki/Trace_(linear_algebra)&quot; target=&quot;_blank&quot;&gt;trace&lt;/a&gt; of the matrices given by $\rho$:&lt;/p&gt;

\[\chi(g) = \text{trace}(\rho(g)).\]

&lt;p&gt;A character $\chi$ is &lt;strong&gt;irreducible&lt;/strong&gt; if the corresponding $(V,\rho)$ is &lt;a href=&quot;/Representation-Theory-Irreducibility-Indecomposability/&quot; target=&quot;_blank&quot;&gt;irreducible&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Despite the simplicity of the definition, the (irreducible) characters of a group contain a surprising amount of information about the group. Some &lt;a href=&quot;http://en.wikipedia.org/wiki/Character_theory#Applications&quot; target=&quot;_blank&quot;&gt;big theorems&lt;/a&gt; in group theory depend heavily on character theory.&lt;/p&gt;

&lt;p&gt;Let’s calculate the character of the permutation representation of $D_4$. For each $g \in G$, we’ll display the pairs:&lt;/p&gt;

\[[\rho(g),\chi(g)]\]

&lt;p&gt;&lt;em&gt;(The Sage cells in this post are linked, so things may not work if you don’t execute them in order.)&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# Define group and its permutation representation
G = DihedralGroup(4)

def rho(g):
    return g.matrix()

# Define a function that returns the character of a representation
def character(rho):
    def chi(g):
        return rho(g).trace()
    return chi

# Compute the character
chi = character(rho)

for g in G:
    show([rho(g),chi(g)])
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Many of the following properties of characters can be deduced from properties of the trace:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The &lt;strong&gt;dimension&lt;/strong&gt; of a character is the dimension of $V$ in $(V,\rho)$. Since $\rho(\text{Id})$ is always the identity matrix, the dimension of $\chi$ is $\chi(\text{Id})$.&lt;/li&gt;
  &lt;li&gt;Because the trace is &lt;a href=&quot;http://en.wikipedia.org/wiki/Similarity_invariance&quot; target=&quot;_blank&quot;&gt;invariant under similarity transformations&lt;/a&gt;, $\chi(hgh^{-1}) = \chi(g)$ for all $g,h \in G$. So characters are constant on conjugacy classes, and are thus &lt;a href=&quot;http://en.wikipedia.org/wiki/Class_function&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;class functions&lt;/strong&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Let $\chi_V$ denote the character of $(V,\rho)$. Recalling the definitions of &lt;a href=&quot;/Representation-Theory-Sums-Products/&quot; target=&quot;_blank&quot;&gt;direct sums and tensor products&lt;/a&gt;, we see that&lt;/li&gt;
&lt;/ol&gt;

\[\begin{align*}
  \chi_{V_1 \oplus V_2} &amp;amp;= \chi_{V_1} + \chi_{V_2} \\
  \chi_{V_1 \otimes V_2} &amp;amp;= \chi_{V_1} \times \chi_{V_2}
\end{align*}\]

&lt;h2 id=&quot;the-character-table&quot;&gt;The Character Table&lt;/h2&gt;

&lt;p&gt;Let’s ignore the representation $\rho$ for now, and just look at the character $\chi$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
print(&quot;chi  g&quot;)
table([[chi(g),g] for g in G])
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;This is succinct, but we can make it even shorter. From point 2 above, $\chi$ is constant on conjugacy classes of $G$, so we don’t lose any information by just looking at the values of $\chi$ on each conjugacy class:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
print(&quot;chi  conjugacy class&quot;)
table([[chi(C[0]),C.list()] for C in G.conjugacy_classes()])
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Even shorter, let’s just display the values of $\chi$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
[chi(g) for g in G.conjugacy_classes_representatives()]
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;This single row of numbers represents the character of &lt;em&gt;one&lt;/em&gt; representation of $G$. If we knew all the irreducible representations of $G$ and their corresponding characters, we could form a table with one row for each character. This is called the &lt;a href=&quot;http://en.wikipedia.org/wiki/Character_table&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;character table&lt;/strong&gt;&lt;/a&gt; of $G$.&lt;/p&gt;

&lt;p&gt;Remember how we had to define our representations by hand, one by one? We don’t have to do that for characters, because  SageMath has the &lt;a href=&quot;http://www.sagemath.org/doc/constructions/rep_theory.html&quot; target=&quot;_blank&quot;&gt;character tables of small groups built-in&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
char_table = G.character_table()
char_table
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;This just goes to show how important the character of a group is. We can also access individual characters as a functions. Let’s say we want the last one:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
c = G.character(char_table[4])

print(&quot;c(g) for each g in G:&quot;)
print([c(g) for g in G])

print(&quot;c(g) for each conjugacy class:&quot;)
print([c(g) for g in G.conjugacy_classes_representatives()])
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Notice that the character we were playing with, $[4,2,0,0,0]$, is not in the table. This is because its representation $\rho$  is not irreducible. At the end of the post on &lt;a href=&quot;/Representation-Theory-Decomposing-Representations/&quot; target=&quot;_blank&quot;&gt;decomposing representations&lt;/a&gt;, we saw that $\rho$ splits into two $1$-dimensional irreducible representations and one $2$-dimensional one. It’s not hard to see that the character of $\rho$ is the sum of rows 1,4 and 5 in our character table:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
c1 = G.character(char_table[0])
c4 = G.character(char_table[3])
c5 = G.character(char_table[4])

c = c1 + c4 + c5

print(&quot;c1 + c4 + c5:&quot;)
print([c(g) for g in G.conjugacy_classes_representatives()])

print(&quot;chi:&quot;)
print([chi(g) for g in G.conjugacy_classes_representatives()])
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Just as we could decompose every representation of $G$ into a sum of irreducible representations, we can express any character as a sum of irreducible characters.&lt;/p&gt;

&lt;p&gt;The next post discusses how to do this easily, by making use of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Schur_orthogonality_relations&quot;&gt;Schur orthogonality relations&lt;/a&gt;. These are really cool relations among the rows and columns of the character table. Apart from decomposing representations into irreducibles, we’ll also be able to prove that the character table is always square!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit:&lt;/strong&gt; The promised “next post” about these topics never happened. Maybe sometime in the far future, I might come back to these topics, but no promises for now!&lt;/p&gt;

</description>
        <pubDate>Fri, 20 Mar 2015 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Character-Theory/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Character-Theory/</guid>
      </item>
      
    
      
      <item>
        <title>Animated GIFs</title>
        <description>&lt;p&gt;I really should be posting about character theory, but I got distracted making some aesthetic changes to this blog (new icon and favicon!) and creating animations like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/harmonograph_loop.gif&quot; alt=&quot;harmonograph&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;no_out&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
d,c,p,k = [0.01, 0.05, -0.15, 0.05]
x(t,u) = (sin(t*2*pi) + sin((1-c + u*c*2)*t*2*pi) + p*pi)*exp(-d*t)
y(t,u) = (sin((1-c+ 0.55*c*2)*t*2*pi + k*(1-u)*pi) + cos((1-c + 0.9*c*2)*t*2*pi) + p*pi)*exp(-d*t)
  
a = animate([parametric_plot((x(t,u),y(t,u)),(t,0,30),color = [u,1-u,0.52], axes= False, plot_points = 200) for u in [0.48+0.45*sin(v) for v in srange(0,2*pi,0.1)]])
a.gif(savefile=&apos;my_animation.gif&apos;, delay=20, iterations=0)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;I’m not putting this in a SageCell because this could take quite a while, especially if you increase the number of frames (by changing the parameters in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;srange&lt;/code&gt;), but feel free to try it out on your own copy of Sage. It saves an animated GIF that loops forever (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iterations = 0&lt;/code&gt;) at the location specified by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;savefile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For more information, checkout the &lt;a href=&quot;http://www.sagemath.org/doc/reference/plotting/sage/plot/animate.html&quot;&gt;Sage reference for animated plots&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Thu, 12 Mar 2015 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Animations/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Animations/</guid>
      </item>
      
    
      
      <item>
        <title>The Group Ring and the Regular Representation</title>
        <description>&lt;p&gt;In the &lt;a href=&quot;/Representation-Theory-Decomposing-Representations/&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt;, we saw how to decompose a given group representation into irreducibles. But we still don’t know much about the irreducible representations of a (finite) group. What do they look like? How many are there? Infinitely many?&lt;/p&gt;

&lt;p&gt;In this post, we’ll construct the &lt;a href=&quot;http://en.wikipedia.org/wiki/Group_ring&quot; target=&quot;_blank&quot;&gt;group ring&lt;/a&gt; of a group. Treating this as a vector space, we get the &lt;a href=&quot;http://en.wikipedia.org/wiki/Regular_representation&quot; target=&quot;_blank&quot;&gt;regular representation&lt;/a&gt;, which turns out to contain &lt;em&gt;all&lt;/em&gt; the irreducible representations of $G$!&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-group-ring-fg&quot;&gt;The group ring $FG$&lt;/h2&gt;

&lt;p&gt;Given a (finite) group $G$ and a field $F$, we can treat each element of $G$ as a basis element of a vector space over $F$. The resulting vector space generated by $g \in G$ is&lt;/p&gt;

\[FG := \left\{\sum_{g\in G} \alpha_g g: \alpha_g \in F \right\}.\]

&lt;p&gt;Let’s do this is Sage with the group $G = D_4$ and the field $F = \mathbb{Q}$:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(The Sage cells in this post are linked, so things may not work if you don’t execute them in order.)&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
G = DihedralGroup(4)
F = QQ

FG = GroupAlgebra(G,F)

v = FG.an_element()
v
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;We can view $v \in FG$ as vector in $F^n$, where $n$ is the size of $G$ :&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
v.to_vector()
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Here, we’re treating each $g \in G$ as a basis element of $FG$&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
for g in G:
    g = FG(g)
    print(&quot;{} = {}&quot;.format(g.to_vector(),g))
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Vectors in $FG$ are added component-wise:&lt;/p&gt;

\[\left(\sum_{g \in G} \alpha_g g\right) + \left(\sum_{g\in G} \beta_g g\right) = \sum_{g \in G} (\alpha_g+\beta_g) g.\]

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
w = FG.random_element()  
print(&apos;w = {}&apos;.format(w.to_vector()))
print(&apos;v + w = {}&apos;.format((v + w).to_vector()))
  &lt;/script&gt;
&lt;/div&gt;

&lt;h2 id=&quot;multiplication-as-a-linear-transformation&quot;&gt;Multiplication as a linear transformation&lt;/h2&gt;

&lt;p&gt;In fact $FG$ is also a  &lt;em&gt;ring&lt;/em&gt; (called the &lt;a href=&quot;http://en.wikipedia.org/wiki/Group_ring&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;group ring&lt;/strong&gt;&lt;/a&gt;), because we can multiply vectors using the multiplication rule of the group $G$:&lt;/p&gt;

\[\left(\sum_{h \in G} \alpha_h h\right) \left(\sum_{g\in G} \beta_g g\right) = \sum_{h,g \in G} (\alpha_h \beta_g) hg.\]

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
print(&apos;v * w = {}&apos;.format((v * w).to_vector()))
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;That wasn’t very illuminating. However, treating multiplication by $v \in FG$ as a function&lt;/p&gt;

\[\begin{align*}
T_v: FG &amp;amp;\to FG \\
w &amp;amp;\mapsto vw,
\end{align*}\]

&lt;p&gt;one can check that each $T_v$ is a linear transformation! We can thus represent $T_v$ as a matrix whose columns are $T_v(g), g \in G$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
for g in G:
    g = FG(g)
    print(&quot;v*{} = {}&quot;.format(g.to_vector(),(v*g).to_vector()))

T = matrix([(v*FG(g)).to_vector() for g in G]).transpose()
show(T)
  &lt;/script&gt;
&lt;/div&gt;

&lt;h2 id=&quot;the-regular-representation&quot;&gt;The regular representation&lt;/h2&gt;

&lt;p&gt;We’re especially interested in $T_g, g \in G$. These are invertible, with inverse $T_{g^{-1}}$, and their matrices are all permutation matrices, because multiplying by $g \in G$ simply permutes elements of $G$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
for v in G:
    v = FG(v)
    show(matrix([(v*FG(g)).to_vector() for g in G]).transpose())
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;Define a function $\rho_{FG}$ which assigns to each $g\in G$ the corresponding $T_g$:&lt;/p&gt;

\[\begin{align*}
\rho_{FG}: G &amp;amp;\to \mathrm{GL}(FG) \\
g &amp;amp;\mapsto T_g
\end{align*}\]

&lt;p&gt;Then $(FG,\rho_{FG})$ is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Regular_representation&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;regular representation&lt;/strong&gt;&lt;/a&gt; of $G$ over $F$.&lt;/p&gt;

&lt;p&gt;The regular representation of any non-trivial group is not irreducible. In fact, it is a direct sum of &lt;em&gt;all&lt;/em&gt; the irreducible representations of $G$! What’s more, if $(V,\rho)$ is an irreducible representation of $G$ and $\dim V = k$, then $V$ occurs $k$ times in the direct-sum decomposition of $FG$!&lt;/p&gt;

&lt;p&gt;Let’s apply the decomposition algorithm in the &lt;a href=&quot;/Representation-Theory-Decomposing-Representations/&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt; to $(FG,\rho_{FG})$ (this might take a while to run):&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# Define group and its regular representation
G = DihedralGroup(4)
FG = GroupAlgebra(G,QQbar)

def rho(h):
    h = FG(h)
    return matrix([(h*FG(g)).to_vector() for g in G]).transpose()    
    
# Decomposition algorithms
import numpy as np

def is_irreducible(rho,G, n= None):
  &quot;&quot;&quot;
  If rho is irreducible, returns (True, I)  where I is the n-by-n identity matrix, n = dimension of rho.
  Otherwise, returns (False, H) where H is a non-scalar matrix that commutes with rho(G).
  &quot;&quot;&quot;
  # Compute the dimension of the representation
  if n is None:
      n = rho(G.identity()).dimensions()[0]
  
  # Run through all r,s = 1,2,...,n
  for r in range(n):
      for s in range(n):
          # Define H_rs
          H_rs = matrix.zero(QQbar,n)
          if r == s:
              H_rs[r,s] = 1
          elif r &gt; s:
              H_rs[r,s] = 1
              H_rs[s,r] = 1
          else: # r &lt; s
              H_rs[r,s] = I
              H_rs[s,r] = -I
          
          # Compute H
          H = sum([rho(g).conjugate_transpose()*H_rs*rho(g) for g in G])/G.cardinality()
          
          # Check if H is scalar
          if H[0,0]*matrix.identity(n) != H:
              return False,H
  
  # If all H are scalar
  return True, matrix.identity(n)

def decompose(rho,G,H):
    &quot;&quot;&quot;
    Uses the eigenspaces of H to decompose G into subrepresentations.
    Returns a change of basis matrix P and the indices of the block-decomposition of rho in this basis.
    &quot;&quot;&quot;
    
    # Compute J,P such that H = PJP^(-1)
    J,P = H.jordan_form(QQbar,transformation=True)

    # Compute block subdivisions
    edges = []
    for g in G:
        edges += (P.conjugate_transpose()*rho(g)*P).nonzero_positions()
    graph = Graph(edges, multiedges = False, loops = True)
    subrep_indices = sorted(graph.connected_components(), key=lambda x: x[0])    
    
    return P,subrep_indices  

def irr_decompose(rho,G,index = None):
    &quot;&quot;&quot;
    Decomposes rho into irreducible representations of G.
    Returns a change of basis matrix P and the indices of the block-decomposition of rho in this basis.
    &quot;&quot;&quot;
    n = rho(G.identity()).dimensions()[0]
    if index is None:
        index = range(n)
        
    # Test for irreducibility
    is_irred, H = is_irreducible(rho,G,n)
    
    if is_irred:
        subrep_indices = list(np.array(index)[range(n)])
        return H, [subrep_indices]
    else:
        P, subrep_indices = decompose(rho,G,H)
        print([list(np.array(index)[subrep_index]) for subrep_index in subrep_indices])

        new_subrep_indices = []
        new_P_list = []
        
        for subrep_index in subrep_indices:
            
            def subrep(g):
                return (P.inverse()*rho(g)*P)[subrep_index,subrep_index]
            new_P, new_indices = irr_decompose(subrep,G, list(np.array(index)[subrep_index]))
            
            new_subrep_indices += new_indices
            new_P_list += [new_P]
        
        return P*block_diagonal_matrix(new_P_list), new_subrep_indices

def show_irreps(rho,G,P,irrep_indices):
    subdivisions = [i for subrep_index in irrep_indices for i in subrep_index][1:]
    for subrep in irrep_indices:
        for i in subrep[1:]:
            subdivisions.remove(i)

    # Display rho in block-diagonal form
    for g in G:
        M = P.inverse()*rho(g)*P
        M == M*1 # Just for aesthetics
        M.subdivide(subdivisions, subdivisions)
        show(M)

# Execute!
P,irrep_indices = irr_decompose(rho,G)
print(irrep_indices)
show_irreps(rho,G,P,irrep_indices)    
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;So the regular representation of $D_4$ decomposes into four (distinct) $1$-dim representations and two (isomorphic) $2$-dim ones.&lt;/p&gt;

&lt;h2 id=&quot;building-character&quot;&gt;Building character&lt;/h2&gt;

&lt;p&gt;We’ve spent a lot of time working directly with representations of a group. While more concrete, the actual matrix representations themselves tend to be a little clumsy, especially when the groups in question get large.&lt;/p&gt;

&lt;p&gt;In the next few posts, I’ll switch gears to &lt;a href=&quot;http://en.wikipedia.org/wiki/Character_theory&quot; target=&quot;_blank&quot;&gt;character theory&lt;/a&gt;, which is a simpler but more powerful way of working with group representations.&lt;/p&gt;
</description>
        <pubDate>Sun, 15 Feb 2015 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Group-Ring-Regular-Representation/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Group-Ring-Regular-Representation/</guid>
      </item>
      
    
      
      <item>
        <title>Decomposing Representations</title>
        <description>&lt;p&gt;In this post, we’ll implement an algorithm for decomposing representations that &lt;a href=&quot;http://www.ams.org/journals/mcom/1970-24-111/S0025-5718-1970-0280611-6/S0025-5718-1970-0280611-6.pdf&quot; target=&quot;_blank&quot;&gt;Dixon published in 1970&lt;/a&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;As a motivating example, I’ll use the permutation matrix representation of $D_4$ that we saw in an &lt;a href=&quot;/Representation-Theory-Intro/&quot; target=&quot;_blank&quot;&gt;earlier post&lt;/a&gt;. To make the code more generally applicable, let’s call the group $G$ and the representation $\rho$:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(The Sage cells in this post are linked, so things may not work if you don’t execute them in order.)&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
G = DihedralGroup(4)

# Defining the permutation representation
def rho(g):
    return g.matrix()

g = G.an_element()
show(rho(g))
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;We’ll see that this is decomposable, and find out what its irreducible components are.&lt;/p&gt;

&lt;h3 id=&quot;unitary-representations&quot;&gt;Unitary representations&lt;/h3&gt;

&lt;p&gt;A short remark before we begin: The algorithm assumes that $\rho$ is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Unitary_representation&quot; target=&quot;_blank&quot;&gt;unitary representation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;i.e. for all $g \in G$,&lt;/p&gt;

\[\rho(g)^* \rho(g) = \rho(g) \rho(g)^* = I,\]

&lt;p&gt;where $A*$ is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Conjugate_transpose&quot; target=&quot;_blank&quot;&gt;conjugate transpose&lt;/a&gt; of a matrix $A$. 
For $G$ a finite group, all representations can be made unitary under an appropriate change of basis, so we need not be too concerned about this. In any case, permutation representations are always unitary, so we can proceed with our example.&lt;/p&gt;

&lt;h2 id=&quot;finding-non-scalar-commuting-matrices&quot;&gt;Finding non-scalar, commuting matrices&lt;/h2&gt;

&lt;p&gt;At the end of the &lt;a href=&quot;/Representation-Theory-Irreducibility-Indecomposability/&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt; we saw that in order to decompose a representation $(V,\rho)$, it is enough to find a non-scalar matrix $T$ that commutes with $\rho(g)$ for every $g \in G$.  This first step finds a &lt;a href=&quot;http://en.wikipedia.org/wiki/Hermitian_matrix&quot; target=&quot;_blank&quot;&gt;Hermitian&lt;/a&gt; non-scalar $H$ that commutes with $\rho(G)$ (if there is one to be found).&lt;/p&gt;

&lt;p&gt;Let $E_{rs}$ denote the $n \times n$ matrix with a $1$ in the $(r,s)$th entry and zeros everywhere else. Here $n$ is the dimension of $V$ in the representation $(V,\rho)$. Define&lt;/p&gt;

\[H_{rs} = \begin{cases}
E_{rr} &amp;amp;\text{if } r = s \\
E_{rs} + E_{sr} &amp;amp;\text{if } r &amp;gt; s \\
i(E_{rs} - E_{sr}) &amp;amp;\text{if } r &amp;lt; s,
\end{cases}\]

&lt;p&gt;then the set of matrices $H_{rs}$ forms a Hermitian basis for the $n \times n$ matrices over $\mathbb{C}$.&lt;/p&gt;

&lt;p&gt;Now for each $r,s$, compute the sum&lt;/p&gt;

\[H = \frac{1}{|G|} \sum_{g \in G} \,\, \rho(g)^* \, H_{rs} \, \rho(g).\]

&lt;p&gt;Observe that $H$ has the following properties:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;it is hermitian&lt;/li&gt;
  &lt;li&gt;it commutes with $\rho(g)$ for all $g \in G$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If $\rho$ is irreducible, then $H$ is a scalar matrix for all $r,s$. Otherwise, it turns out that there &lt;strong&gt;will&lt;/strong&gt; be some $r,s$ such that $H$ is non-scalar (this is due to the fact that the $H_{rs}$ matrices form a basis of the $n \times n$ matrices$).&lt;/p&gt;

&lt;p&gt;Let’s test this algorithm on our permutation representation of $D_4$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
def is_irreducible(rho,G, n= None):
  &quot;&quot;&quot;
  If rho is irreducible, returns (True, I)  where I is the n-by-n identity matrix, n = dimension of rho.
  Otherwise, returns (False, H) where H is a non-scalar matrix that commutes with rho(G).
  &quot;&quot;&quot;
  # Compute the dimension of the representation
  if n is None:
      n = rho(G.identity()).dimensions()[0]
  
  # Run through all r,s = 1,2,...,n
  for r in range(n):
      for s in range(n):
          # Define H_rs
          H_rs = matrix.zero(QQbar,n)
          if r == s:
              H_rs[r,s] = 1
          elif r &gt; s:
              H_rs[r,s] = 1
              H_rs[s,r] = 1
          else: # r &lt; s
              H_rs[r,s] = I
              H_rs[s,r] = -I
          
          # Compute H
          H = sum([rho(g).conjugate_transpose()*H_rs*rho(g) for g in G])/G.cardinality()
          
          # Check if H is scalar
          if H[0,0]*matrix.identity(n) != H:
              return False,H
  
  # If all H are scalar
  return True, matrix.identity(n)

is_irred,H = is_irreducible(rho,G) 

show(is_irred)
show(H)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;We get a non-scalar $H$! So the permutation representation of $D_4$ is reducible!&lt;/p&gt;

&lt;h2 id=&quot;using-h-to-decompose-rho&quot;&gt;Using $H$ to decompose $\rho$&lt;/h2&gt;

&lt;p&gt;Our next step is to use the eigenspaces of $H$ to decompose $\rho$. At the end of the &lt;a href=&quot;/Representation-Theory-Irreducibility-Indecomposability/&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt;, we saw that $\rho(g)$ preserves the eigenspaces of $H$, so we need only find the eigenspaces of $H$ to decompose $\rho$.&lt;/p&gt;

&lt;p&gt;Since $H$ is hermitian, it is &lt;a href=&quot;http://en.wikipedia.org/wiki/Diagonalizable_matrix&quot; target=&quot;_blank&quot;&gt;diagonalizable&lt;/a&gt;, so its eigenvectors form a basis of $V$. In fact, the eigenbasis can be chosen to be orthonormal.&lt;/p&gt;

&lt;p&gt;We can find this basis by computing the &lt;a href=&quot;http://en.wikipedia.org/wiki/Jordan_normal_form&quot; target=&quot;_blank&quot;&gt;Jordan decomposition&lt;/a&gt; of $H$, and then orthonormalizing it:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# Compute J,P such that H = PJP^(-1)
J,P = H.jordan_form(QQbar,transformation=True)
P = P.transpose().gram_schmidt(orthonormal=True)[0].transpose()

show(P)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;It’s important that we orthonormalize $P$ so that $P$ becomes unitary. This will ensure that the subrepresentations remain unitary.&lt;/p&gt;

&lt;p&gt;Observe that $P^{-1} \rho(g) P$ has the same block-diagonal form for each $g \in G$:&lt;/p&gt;

&lt;div class=&quot;linked&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# Compute block subdivisions (just for aesthetics)
edges = []
for g in G:
    edges += (P.conjugate_transpose()*rho(g)*P).nonzero_positions()
graph = Graph(edges, multiedges = False, loops = True)
subrep_indices = graph.connected_components()
subdivisions = graph.vertices()[1:]
for l in subrep_indices:
    for i in l[1:]:
        subdivisions.remove(i)
      
# Display rho in block-diagonal form
for g in G:
    M = P.inverse()*rho(g)*P
    M.subdivide(subdivisions, subdivisions)
    show(M)
  &lt;/script&gt;
&lt;/div&gt;

&lt;p&gt;We have thus decomposed $\rho$ into two 1-dimensional representations and one 2-dimensional one!&lt;/p&gt;

&lt;h2 id=&quot;decomposing-into-irreducibles&quot;&gt;Decomposing into irreducibles&lt;/h2&gt;

&lt;p&gt;Finally, to get a decomposition into irreducibles,  we can apply the algorithm recursively on each of the subrepresentations to see if they further decompose.&lt;/p&gt;

&lt;p&gt;Here’s a stand-alone script that decomposes a representation into its irreducible components:&lt;/p&gt;

&lt;div class=&quot;sage&quot;&gt;
  &lt;script type=&quot;text/x-sage&quot;&gt;
# Define group and representation here
G = DihedralGroup(4)
def rho(g):
    return g.matrix()
    
# Algorithms
import numpy as np

def is_irreducible(rho,G, n= None):
  &quot;&quot;&quot;
  If rho is irreducible, returns (True, I)  where I is the n-by-n identity matrix, n = dimension of rho.
  Otherwise, returns (False, H) where H is a non-scalar matrix that commutes with rho(G).
  &quot;&quot;&quot;
  # Compute the dimension of the representation
  if n is None:
      n = rho(G.identity()).dimensions()[0]
  
  # Run through all r,s = 1,2,...,n
  for r in range(n):
      for s in range(n):
          # Define H_rs
          H_rs = matrix.zero(QQbar,n)
          if r == s:
              H_rs[r,s] = 1
          elif r &gt; s:
              H_rs[r,s] = 1
              H_rs[s,r] = 1
          else: # r &lt; s
              H_rs[r,s] = I
              H_rs[s,r] = -I
          
          # Compute H
          H = sum([rho(g).conjugate_transpose()*H_rs*rho(g) for g in G])/G.cardinality()
          
          # Check if H is scalar
          if H[0,0]*matrix.identity(n) != H:
              return False,H
  
  # If all H are scalar
  return True, matrix.identity(n)

def decompose(rho,G,H):
    &quot;&quot;&quot;
    Uses the eigenspaces of H to decompose G into subrepresentations.
    Returns a change of basis matrix P and the indices of the block-decomposition of rho in this basis.
    &quot;&quot;&quot;
    
    # Compute J,P such that H = PJP^(-1)
    J,P = H.jordan_form(QQbar,transformation=True)
    P = P.transpose().gram_schmidt(orthonormal=True)[0].transpose()

    # Compute block subdivisions
    edges = []
    for g in G:
        edges += (P.conjugate_transpose()*rho(g)*P).nonzero_positions()
    graph = Graph(edges, multiedges = False, loops = True)
    subrep_indices = sorted(graph.connected_components(), key=lambda x: x[0])    
    
    return P,subrep_indices  

def irr_decompose(rho,G,index = None):
    &quot;&quot;&quot;
    Decomposes rho into irreducible representations of G.
    Returns a change of basis matrix P and the indices of the block-decomposition of rho in this basis.
    &quot;&quot;&quot;
    n = rho(G.identity()).dimensions()[0]
    if index is None:
        index = range(n)
        
    # Test for irreducibility
    is_irred, H = is_irreducible(rho,G,n)
    
    if is_irred:
        subrep_indices = list(np.array(index)[range(n)])
        return H, [subrep_indices]
    else:
        P, subrep_indices = decompose(rho,G,H)
        print([list(np.array(index)[subrep_index]) for subrep_index in subrep_indices])

        new_subrep_indices = []
        new_P_list = []
        
        for subrep_index in subrep_indices:
            
            def subrep(g):
                return (P.inverse()*rho(g)*P)[subrep_index,subrep_index]
            new_P, new_indices = irr_decompose(subrep,G, list(np.array(index)[subrep_index]))
            
            new_subrep_indices += new_indices
            new_P_list += [new_P]
        
        return P*block_diagonal_matrix(new_P_list), new_subrep_indices

def show_irreps(rho,G,P,irrep_indices):
    subdivisions = [i for subrep_index in irrep_indices for i in subrep_index][1:]
    for subrep in irrep_indices:
        for i in subrep[1:]:
            subdivisions.remove(i)

    # Display rho in block-diagonal form
    for g in G:
        M = P.inverse()*rho(g)*P
        M.subdivide(subdivisions, subdivisions)
        show(M)

# Execute!
P,irrep_indices = irr_decompose(rho,G)
show_irreps(rho,G,P,irrep_indices)    
  &lt;/script&gt;
&lt;/div&gt;

&lt;h2 id=&quot;getting-all-irreducible-representations&quot;&gt;Getting all irreducible representations&lt;/h2&gt;

&lt;p&gt;Now we know how to test for irreducibility and decompose reducible representations. But we still don’t know how many irreducible representations a group has.&lt;/p&gt;

&lt;p&gt;It turns out that finite groups have finitely many irreducible representations! In the &lt;a href=&quot;/Group-Ring-Regular-Representation/&quot;&gt;next post&lt;/a&gt;, we’ll construct a representation for any finite group $G$ that contains &lt;em&gt;all&lt;/em&gt; the irreducible representations of $G$.&lt;/p&gt;

</description>
        <pubDate>Mon, 02 Feb 2015 00:00:00 +0000</pubDate>
        <link>http://sheaves.github.io/Representation-Theory-Decomposing-Representations/</link>
        <guid isPermaLink="true">http://sheaves.github.io/Representation-Theory-Decomposing-Representations/</guid>
      </item>
      
    
  </channel>
</rss>
