我一直在尝试使用 scikit-learn 库来解决这个问题。大致如下:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.

如上所示,sklearn设计选择将多项式回归分离为PolynomialFeatures和,LinearRegression而不是将它们组合成单个函数。这种分离具有概念上的优势,但也有一个主要缺点:它实际上阻止了model提供方法gradienthessianmodel如果这样做会更有用。

我目前的解决方法是使用有限差分方程和model.predict来近似梯度和 Hessian 的元素(如此处所述。但我不喜欢这种方法——它对浮点误差很敏感,而且构建梯度和 Hessian 所需的“精确”信息已经包含在中model.coef_

有没有更优雅或更准确的方法来拟合 p 维多项式并在 Python 中找到其梯度和 Hessian?我可以使用不同的库。


更新。 在我看来,最简单的方法(对程序员来说最简单)似乎是使用simpy,大致如 Jhon 的回答中所述。这是我的最终代码,作为对 Jhon 的回答的编辑提交的。

from numpy.random import rand
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sympy import symbols, prod, diff
from numpy import sum, linspace

n = 200
p = 4
X = rand(n, p)
y = rand(n)

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

powers = poly.powers_
coeffs = model.coef_
syms = symbols(f'x0:{p}')

f = sum([coeffs[i]*prod(syms**powers[i]) for i in range(len(coeffs))])

Df = [diff(f, xj) for xj in syms]
H = [[diff(djf, xk) for xk in syms] for djf in Df]
print('Gradient:', Df)
print('Hessian:', H)

x0 = linspace(0.2, 0.7, p)  # test point
Df_x0 = [Df[k].evalf(subs={syms[j]:x0[j] for j in range(p)}) for k in range(p)]
print('x0:', x0)
print('Gradient at x0:', Df_x0)

但使用simpy不是必需的,正如 bb1 的出色回答所示。这是另一种编写 bb1 答案的方法,我发现这种方法更容易阅读。

from numpy import *

x0 = [0, 1]

# Define p, X, y, poly, model, powers, and coeffs.

grad = array(p * [float(0)])
for i in range(p):
    gc = powers.T[i] * coeffs  # gradient coefficients
    gp = powers.copy()  # gradient powers
    gp[:,i] = maximum(0, gp[:,i] - 1)
    grad[i] = sum([gc[j]*prod(x0**gp[j]) for j in range(len(coeffs))])
print('grad = ', grad)

hess = p * [None]
for i in range(p):
    gc = powers.T[i] * coeffs
    gp = powers.copy()
    gp[:,i] = maximum(0, gp[:,i] - 1)
    hess[i] = p * [float(0)]
    for j in range(p):
        hc = gp.T[j] * gc  # Hessian coefficients
        hp = gp.copy()  # Hessian powers
        hp[:,j] = maximum(0, hp[:,j] - 1)
        hess[i][j] = sum([hc[k]*prod(x0**hp[k]) for k in range(len(coeffs))])
hess = array(hess)
print('hess = ', hess)


最佳答案
2

要计算多项式的梯度或 Hessian,需要知道每个单项式中变量的指数和相应的单项式系数。 这些信息的第一部分由 提供poly.powers_,第二部分由 提供model.coef_

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

np.set_printoptions(precision=2, suppress=True)

X = np.arange(6).reshape(3, 2)
y = np.arange(3)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

print("Exponents:")
print(poly.powers_.T)
print("Coefficients:")
print(model.coef_)

得出:

Exponents:
[[0 1 0 2 1 0]
 [0 0 1 0 1 2]]
Coefficients:
[ 0.    0.13  0.13 -0.12 -0.    0.13]

然后可以使用以下函数来计算数组给定点的梯度x

def gradient(x, powers, coeffs):
    x = np.array(x)
    gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
    gp = gp.transpose(1, 2, 0)
    gc = coeffs * powers.T
    return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)

例如,我们可以用它来计算点处的梯度[0, 1]

print(gradient([0, 1],  poly.powers_, model.coef_))

得出:

[0.13 0.38]

可以用类似的方式计算给定点的 Hessian。

您可以执行符号微分来计算精确的梯度和 Hessian。由于多项式模型的系数在拟合后已知,因此您可以直接对多项式进行微分以符号形式获得其梯度和 Hessian 矩阵。

import sympy as sp
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression


X = np.random.rand(100, 2)
y = np.random.rand(100)


poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)


coef = model.coef_
intercept = model.intercept_

p = X.shape[1]
vars = sp.symbols(f'x0:{p}')

poly_eq = intercept + sum(coef[i] * (vars[0]**int(i > 0)) * (vars[1]**int(i > 1)) for i in range(len(coef)))

gradient = [sp.diff(poly_eq, v) for v in vars]


hessian = [[sp.diff(grad, v) for v in vars] for grad in gradient]


print("Gradient:", gradient)
print("Hessian:", hessian)

1

  • 这会产生错误的结果,因为它poly_eq没有反映使用 sklearn 创建的多项式特征。具体来说,它假设在这些特征中,每个变量的指数要么为 0,要么为 1,但事实并非如此。一旦修复此问题,它应该可以正常工作。


    –