Question

我一直在尝试使用 scikit-learn 库来解决这个问题。大致如下：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.

如上所示，sklearn设计选择将多项式回归分离为PolynomialFeatures和，LinearRegression而不是将它们组合成单个函数。这种分离具有概念上的优势，但也有一个主要缺点：它实际上阻止了model提供方法gradient和hessian，model如果这样做会更有用。

我目前的解决方法是使用有限差分方程和model.predict来近似梯度和 Hessian 的元素（如此处所述。但我不喜欢这种方法——它对浮点误差很敏感，而且构建梯度和 Hessian 所需的“精确”信息已经包含在中model.coef_。

有没有更优雅或更准确的方法来拟合 p 维多项式并在 Python 中找到其梯度和 Hessian？我可以使用不同的库。

更新。 在我看来，最简单的方法（对程序员来说最简单）似乎是使用simpy，大致如 Jhon 的回答中所述。这是我的最终代码，作为对 Jhon 的回答的编辑提交的。

from numpy.random import rand
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sympy import symbols, prod, diff
from numpy import sum, linspace

n = 200
p = 4
X = rand(n, p)
y = rand(n)

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

powers = poly.powers_
coeffs = model.coef_
syms = symbols(f'x0:{p}')

f = sum([coeffs[i]*prod(syms**powers[i]) for i in range(len(coeffs))])

Df = [diff(f, xj) for xj in syms]
H = [[diff(djf, xk) for xk in syms] for djf in Df]
print('Gradient:', Df)
print('Hessian:', H)

x0 = linspace(0.2, 0.7, p)  # test point
Df_x0 = [Df[k].evalf(subs={syms[j]:x0[j] for j in range(p)}) for k in range(p)]
print('x0:', x0)
print('Gradient at x0:', Df_x0)

但使用simpy不是必需的，正如 bb1 的出色回答所示。这是另一种编写 bb1 答案的方法，我发现这种方法更容易阅读。

from numpy import *

x0 = [0, 1]

# Define p, X, y, poly, model, powers, and coeffs.

grad = array(p * [float(0)])
for i in range(p):
    gc = powers.T[i] * coeffs  # gradient coefficients
    gp = powers.copy()  # gradient powers
    gp[:,i] = maximum(0, gp[:,i] - 1)
    grad[i] = sum([gc[j]*prod(x0**gp[j]) for j in range(len(coeffs))])
print('grad = ', grad)

hess = p * [None]
for i in range(p):
    gc = powers.T[i] * coeffs
    gp = powers.copy()
    gp[:,i] = maximum(0, gp[:,i] - 1)
    hess[i] = p * [float(0)]
    for j in range(p):
        hc = gp.T[j] * gc  # Hessian coefficients
        hp = gp.copy()  # Hessian powers
        hp[:,j] = maximum(0, hp[:,j] - 1)
        hess[i][j] = sum([hc[k]*prod(x0**hp[k]) for k in range(len(coeffs))])
hess = array(hess)
print('hess = ', hess)

Answer 1

要计算多项式的梯度或 Hessian，需要知道每个单项式中变量的指数和相应的单项式系数。这些信息的第一部分由提供poly.powers_，第二部分由提供model.coef_：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

np.set_printoptions(precision=2, suppress=True)

X = np.arange(6).reshape(3, 2)
y = np.arange(3)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

print("Exponents:")
print(poly.powers_.T)
print("Coefficients:")
print(model.coef_)

得出：

Exponents:
[[0 1 0 2 1 0]
 [0 0 1 0 1 2]]
Coefficients:
[ 0.    0.13  0.13 -0.12 -0.    0.13]

然后可以使用以下函数来计算数组给定点的梯度x：

def gradient(x, powers, coeffs):
    x = np.array(x)
    gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
    gp = gp.transpose(1, 2, 0)
    gc = coeffs * powers.T
    return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)

例如，我们可以用它来计算点处的梯度[0, 1]：

print(gradient([0, 1],  poly.powers_, model.coef_))

得出：

[0.13 0.38]

可以用类似的方式计算给定点的 Hessian。

Answer 2

您可以执行符号微分来计算精确的梯度和 Hessian。由于多项式模型的系数在拟合后已知，因此您可以直接对多项式进行微分以符号形式获得其梯度和 Hessian 矩阵。

import sympy as sp
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression


X = np.random.rand(100, 2)
y = np.random.rand(100)


poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)


coef = model.coef_
intercept = model.intercept_

p = X.shape[1]
vars = sp.symbols(f'x0:{p}')

poly_eq = intercept + sum(coef[i] * (vars[0]**int(i > 0)) * (vars[1]**int(i > 1)) for i in range(len(coef)))

gradient = [sp.diff(poly_eq, v) for v in vars]


hessian = [[sp.diff(grad, v) for v in vars] for grad in gradient]


print("Gradient:", gradient)
print("Hessian:", hessian)

这会产生错误的结果，因为它poly_eq没有反映使用 sklearn 创建的多项式特征。具体来说，它假设在这些特征中，每个变量的指数要么为 0，要么为 1，但事实并非如此。一旦修复此问题，它应该可以正常工作。 —

python – 将二次多项式拟合到p维数据并计算其梯度和Hessian矩阵的最佳方法是什么？ – 堆栈内存溢出

最佳答案
2

最佳答案 2

最佳答案
2