我一直在尝试使用 scikit-learn 库来解决这个问题。大致如下:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.
如上所示,sklearn
设计选择将多项式回归分离为PolynomialFeatures
和,LinearRegression
而不是将它们组合成单个函数。这种分离具有概念上的优势,但也有一个主要缺点:它实际上阻止了model
提供方法gradient
和hessian
,model
如果这样做会更有用。
我目前的解决方法是使用有限差分方程和model.predict
来近似梯度和 Hessian 的元素(如此处所述。但我不喜欢这种方法——它对浮点误差很敏感,而且构建梯度和 Hessian 所需的“精确”信息已经包含在中model.coef_
。
有没有更优雅或更准确的方法来拟合 p 维多项式并在 Python 中找到其梯度和 Hessian?我可以使用不同的库。
更新。 在我看来,最简单的方法(对程序员来说最简单)似乎是使用simpy
,大致如 Jhon 的回答中所述。这是我的最终代码,作为对 Jhon 的回答的编辑提交的。
from numpy.random import rand
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sympy import symbols, prod, diff
from numpy import sum, linspace
n = 200
p = 4
X = rand(n, p)
y = rand(n)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
powers = poly.powers_
coeffs = model.coef_
syms = symbols(f'x0:{p}')
f = sum([coeffs[i]*prod(syms**powers[i]) for i in range(len(coeffs))])
Df = [diff(f, xj) for xj in syms]
H = [[diff(djf, xk) for xk in syms] for djf in Df]
print('Gradient:', Df)
print('Hessian:', H)
x0 = linspace(0.2, 0.7, p) # test point
Df_x0 = [Df[k].evalf(subs={syms[j]:x0[j] for j in range(p)}) for k in range(p)]
print('x0:', x0)
print('Gradient at x0:', Df_x0)
但使用simpy
不是必需的,正如 bb1 的出色回答所示。这是另一种编写 bb1 答案的方法,我发现这种方法更容易阅读。
from numpy import *
x0 = [0, 1]
# Define p, X, y, poly, model, powers, and coeffs.
grad = array(p * [float(0)])
for i in range(p):
gc = powers.T[i] * coeffs # gradient coefficients
gp = powers.copy() # gradient powers
gp[:,i] = maximum(0, gp[:,i] - 1)
grad[i] = sum([gc[j]*prod(x0**gp[j]) for j in range(len(coeffs))])
print('grad = ', grad)
hess = p * [None]
for i in range(p):
gc = powers.T[i] * coeffs
gp = powers.copy()
gp[:,i] = maximum(0, gp[:,i] - 1)
hess[i] = p * [float(0)]
for j in range(p):
hc = gp.T[j] * gc # Hessian coefficients
hp = gp.copy() # Hessian powers
hp[:,j] = maximum(0, hp[:,j] - 1)
hess[i][j] = sum([hc[k]*prod(x0**hp[k]) for k in range(len(coeffs))])
hess = array(hess)
print('hess = ', hess)
最佳答案
2
要计算多项式的梯度或 Hessian,需要知道每个单项式中变量的指数和相应的单项式系数。 这些信息的第一部分由 提供poly.powers_
,第二部分由 提供model.coef_
:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
np.set_printoptions(precision=2, suppress=True)
X = np.arange(6).reshape(3, 2)
y = np.arange(3)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
print("Exponents:")
print(poly.powers_.T)
print("Coefficients:")
print(model.coef_)
得出:
Exponents:
[[0 1 0 2 1 0]
[0 0 1 0 1 2]]
Coefficients:
[ 0. 0.13 0.13 -0.12 -0. 0.13]
然后可以使用以下函数来计算数组给定点的梯度x
:
def gradient(x, powers, coeffs):
x = np.array(x)
gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
gp = gp.transpose(1, 2, 0)
gc = coeffs * powers.T
return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)
例如,我们可以用它来计算点处的梯度[0, 1]
:
print(gradient([0, 1], poly.powers_, model.coef_))
得出:
[0.13 0.38]
可以用类似的方式计算给定点的 Hessian。
|
您可以执行符号微分来计算精确的梯度和 Hessian。由于多项式模型的系数在拟合后已知,因此您可以直接对多项式进行微分以符号形式获得其梯度和 Hessian 矩阵。
import sympy as sp
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
X = np.random.rand(100, 2)
y = np.random.rand(100)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
coef = model.coef_
intercept = model.intercept_
p = X.shape[1]
vars = sp.symbols(f'x0:{p}')
poly_eq = intercept + sum(coef[i] * (vars[0]**int(i > 0)) * (vars[1]**int(i > 1)) for i in range(len(coef)))
gradient = [sp.diff(poly_eq, v) for v in vars]
hessian = [[sp.diff(grad, v) for v in vars] for grad in gradient]
print("Gradient:", gradient)
print("Hessian:", hessian)
1
-
这会产生错误的结果,因为它
poly_eq
没有反映使用 sklearn 创建的多项式特征。具体来说,它假设在这些特征中,每个变量的指数要么为 0,要么为 1,但事实并非如此。一旦修复此问题,它应该可以正常工作。
–
|
|