葡萄酒(WINE)数据集分类(PyTorch实现)

news/2024/5/19 6:19:13 标签: pytorch, 贝叶斯, 支持向量机, 神经网络

一、数据集介绍

在这里插入图片描述
Data Set Information:
       These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
       I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version. I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which 13 variables are included in the set.

       The attributes are (dontated by Riccardo Leardi, riclea ‘@’ anchem.unige.it )

  1. Alcohol
  2. Malic acid
  3. Ash
  4. Alcalinity of ash
  5. Magnesium
  6. Total phenols
  7. Flavanoids
  8. Nonflavanoid phenols
  9. Proanthocyanins
  10. Color intensity
  11. Hue
  12. OD280/OD315 of diluted wines
  13. Proline

       In a classification context, this is a well posed problem with “well behaved” class structures. A good data set for first testing of a new classifier, but not very challenging.

Attribute Information:
       All attributes are continuous

       No statistics available, but suggest to standardise variables for certain uses (e.g. for us with classifiers which are NOT scale invariant)

       NOTE: 1st attribute is class identifier (1-3)

二、使用贝叶斯分类

       代码首先加载WINE数据集,并对数据进行预处理,然后划分训练集和测试集,并将它们转换为PyTorch张量。接着计算每个类别的先验概率、均值和标准差,然后定义了一个朴素贝叶斯分类器。最后在测试集上进行预测并计算准确率。需要注意的是,在这个示例中,我们使用了PyTorch的正态分布概率密度函数来计算每个特征的似然概率,这是因为WINE数据集的特征是连续值。如果特征是离散值,我们需要使用多项式分布概率质量函数来计算似然概率。

import torch
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

# 加载WINE数据集
data = load_wine()

# 数据预处理
X = data.data
y = data.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 转换为PyTorch张量
X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).long()
X_test = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(y_test).long()

# 计算每个类别的先验概率
priors = []
for c in range(3):
    priors.append((y_train == c).sum().item() / len(y_train))

# 计算每个类别的均值和标准差
means = []
stds = []
for c in range(3):
    X_c = X_train[y_train == c]
    mean_c = X_c.mean(dim=0)
    std_c = X_c.std(dim=0)
    means.append(mean_c)
    stds.append(std_c)

# 定义朴素贝叶斯分类器
def predict(X):
    scores = []
    for c in range(3):
        log_prior = np.log(priors[c])
        log_likelihood = torch.distributions.Normal(means[c], stds[c]).log_prob(X).sum(dim=1)
        score_c = log_prior + log_likelihood
        scores.append(score_c)
    scores = torch.stack(scores, dim=1)
    _, predicted = torch.max(scores, 1)
    return predicted

# 在测试集上进行预测
y_pred = predict(X_test)
accuracy = (y_pred == y_test).sum().item() / len(y_test)
print('Accuracy on test set: %.2f%%' % (accuracy * 100))

在这里插入图片描述

三、使用支持向量机分类

       代码首先加载WINE数据集,并对数据进行预处理,然后划分训练集和测试集,并将它们转换为PyTorch张量。接着训练一个支持向量机分类器,这里我们选择线性核函数并设置参数C为1.0。最后在测试集上进行预测并计算准确率。
       PyTorch本身并不提供SVM分类器的实现,我们使用了scikit-learn库的SVC类来训练SVM分类器。在训练SVM分类器之前,我们将PyTorch张量转换为NumPy数组,这是因为scikit-learn库的SVC类需要接受NumPy数组作为输入。同样,在预测时,我们也需要将测试集的PyTorch张量转换为NumPy数组。

import torch
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# 加载WINE数据集
data = load_wine()

# 数据预处理
X = data.data
y = data.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 转换为PyTorch张量
X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).long()
X_test = torch.from_numpy(X_test).float()

# 训练SVM分类器
clf = SVC(kernel='linear', C=1.0)
clf.fit(X_train.numpy(), y_train.numpy())

# 在测试集上进行预测
y_pred = clf.predict(X_test.numpy())
accuracy = (y_pred == y_test).sum().item() / len(y_test)
print('Accuracy on test set: %.2f%%' % (accuracy * 100))

在这里插入图片描述

四、使用神经网络分类

       代码首先加载WINE数据集,并对数据进行预处理,然后划分训练集和测试集,并将它们转换为PyTorch张量。接着定义了一个具有三个全连接层的神经网络,使用交叉熵损失函数和Adam优化器进行训练,最后在测试集上进行预测并计算准确率。

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 加载WINE数据集
data = load_wine()

# 数据预处理
X = data.data
y = data.target
scaler = StandardScaler()
X = scaler.fit_transform(X)

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 转换为PyTorch张量
X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).long()
X_test = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(y_test).long()

# 定义神经网络
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        torch.manual_seed(2)
        self.fc1 = nn.Linear(13, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 3)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# 训练神经网络
for epoch in range(100):
    optimizer.zero_grad()
    output = net(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print('Epoch %d Loss: %.4f' % (epoch, loss.item()))

# 在测试集上进行预测
with torch.no_grad():
    output = net(X_test)
    _, predicted = torch.max(output, 1)
    total = y_test.size(0)
    correct = (predicted == y_test).sum().item()
    accuracy = correct / total
    print('Accuracy on test set: %.2f%%' % (accuracy * 100))

在这里插入图片描述


http://www.niftyadmin.cn/n/80622.html

相关文章

【java基础】运算符

运算符 operator 运算符优先级 Operators 操作员Precedence 优先级postfix 后缀expr expr--unary 一元的expr --expr expr -expr ~ !multiplicative 〔数〕乘法的 / %additive 添加剂 -shift 移动<< >> >>>relational 关系的< > < > insta…

DPDK — MEMPOOL(librte_mempool,Memory Pool Manager,内存池管理组件)

目录 文章目录 目录MEMPOOL(librte_mempool,Memory Pool Manager,内存池管理组件)Mempool 的布局关系Mempool Local CacheMempool ObjectMEMPOOL(librte_mempool,Memory Pool Manager,内存池管理组件) MEMPOOL 库提供了一组 API,用于从指定的 Memzone 中分配 Memory …

在Linux和Windows上安装sentinel-1.8.5

记录&#xff1a;380场景&#xff1a;在CentOS 7.9操作系统上&#xff0c;安装sentinel-1.8.5。在Windows上操作系统上&#xff0c;安装sentinel-1.8.5。Sentinel是面向分布式、多语言异构化服务架构的流量治理组件。版本&#xff1a;JDK 1.8 sentinel-1.8.5 CentOS 7.9官网地址…

【人工智能AI】三、NoSQL 实战《NoSQL 企业级基础入门与进阶实战》

帮我写一篇介绍NoSQL的技术文章&#xff0c;文章标题是《NoSQL 实战》&#xff0c;不少于3000字。这篇文章的目录是 3.NoSQL 实战 3.1 MongoDB 入门 3.1.1 MongoDB 基本概念 3.1.2 MongoDB 安装与配置 3.1.3 MongoDB 数据库操作 3.2 Redis 入门 3.2.1 Redis 基本概念 3.2.2 Red…

BUUCTF Reverse xor

题目&#xff1a;BUUCTF Reverse xor 一些犯傻后学到了新东西的记录 查壳&#xff0c;没壳&#xff0c;IDA打开 main函数很好理解&#xff0c;输入一个长度为33的字符串&#xff0c;1-32位与前一位异或后与global相等&#xff0c;则判定flag正确 找global 在strings window直…

网络安全应急响应服务方案怎么写?包含哪些阶段?一文带你了解!

文章目录一、服务范围及流程1.1 服务范围1.2 服务流程及内容二、准备阶段2.1 负责人准备内容2.2 技术人员准备内容&#xff08;一&#xff09;服务需求界定&#xff08;二&#xff09;主机和网络设备安全初始化快照和备份2.3市场人员准备内容&#xff08;1&#xff09;预防和预…

TCP流套接字编程

ServerSocket API ServerSocket 是创建TCP服务端Socket的API。 ServerSocket 构造方法&#xff1a; ServerSocket 方法&#xff1a; Socket API Socket 是客户端Socket&#xff0c;或服务端中接收到客户端建立连接&#xff08;accept方法&#xff09;的请求后&#xff0…

进程-操作系统结构

进程-操作系统结构 中文仅本人理解&#xff0c;有错误请联系我。 操作系统为不同方面服务&#xff0c;有不同的设计角度。 为用户&#xff1a; 使用 为程序员&#xff1a;创造 程序员需要关注的就是system call接口的调度 file systems&#xff1a;ntfs&#xff0c;ext4 commu…