使用 Python 对音频进行特征提取

因为喜欢玩儿音乐游戏, 所以打算研究一下如何用深度学习的模型生成音游的谱面. 这篇文章主要目的是介绍或者总结一些音频的知识和代码.

恩. 如果没玩儿过的话, 音乐游戏大概是下面这个样子.

下面进入正题.

我 Google 了一下, 找到了这篇文章: Music Feature Extraction in Python . 然后这篇文章介绍完了还有一个歌曲分类的实践: Classification of Music into different Genres using keras .

下面的内容会主要参考一下这两篇文章, 并加入一些我的理解. 内容如下:

声音信号介绍

使用 Keras 对歌曲的题材进行分类

主要涉及的背景知识有:

傅里叶变换

采样定理

Python

机器学习

声音基础知识

音频信号

首先先百度一下音频信号,

音频信号是 (Audio) 带有语音, 音乐和音效的有规律的声波的频率, 幅度变化信息载体. 根据声波的特征, 可把音频信息分类为规则音频和不规则声音. 其中规则音频又可以分为语音, 音乐和音效. 规则音频是一种连续变化的模拟信号, 可用一条连续的曲线来表示, 称为声波. 声音的三个要素是音调, 音强和音色. 声波或正弦波有三个重要参数: 频率

, 幅度和相位

, 这也就决定了音频信号的特征.

总体来说: 音频信号就是不同频率和相位的正弦波的一个叠加.

一般的声音大概就是这个样子.

横轴是时间, 纵轴是声音的幅度. 因为本质上就是正弦波的一个叠加, 所以看到其实是有正有负的.

人耳听力频率范围

生活中存在各种正弦波, 但并不是所有的波都能被人耳听到. 比如说我们手机通信的信号, Wi-Fi 信号, 以及阳光都是一种波, 但并不能被人听见.

正常人耳听见声音的频率范围是 20Hz 到 2 万 Hz . 相同强度的声音如频率不同的话, 听起来的响度是不一样的. 至敏感的频率是 3000 和 4000Hz .

所以声波的信号基本上只要关注 2wHz 以内就好了.

奈奎斯特采样定理

声音本质上是一种模拟信号, 但在计算机或者在其他数字设备上传输时, 我们要把模拟信号转换为数字信号, 需要进行采样.

奈奎斯特采样定理如下:

在进行模拟 / 数字信号的转换过程中, 当采样频率 fs.max 大于信号中最高频率 fmax 的 2 倍时(fs.max>2fmax), 采样之后的数字信号完整地保留了原始信号中的信息.

这个定理描述的很简单, 证明其实也不难, 对于声音信号, 只要采样的频率大于 2*2wHz=4WHz 的话, 我们就可以听到无损的音质了.

上面说人耳听力敏感的范围主要是在 4000Hz, 所以我们一般听到的音乐其实是使用 8000Hz 频率进行采样的. 这里可以看下最近比较火的芒种这首歌.

这首歌的时间是 3 分 36 秒也就是 216 秒, 它的标准品质的大小是 3.3M. 这里可以计算下使用 8000Hz 频率, 16bit 进行采样的话, 那么这个文件的大小是:

大概也就是 3.3 兆了.

import librosa
x , sr = librosa.load("visions.mp3", sr=8000)
print(x.shape, sr)

%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)

X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))   # 把幅度转成分贝格式
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar()

n0 = 9000
n1 = 9100
plt.figure(figsize=(14, 5))
plt.plot(x[n0:n1])
plt.grid()

zero_crossings = librosa.zero_crossings(x[n0:n1], pad=False)
print(sum(zero_crossings))

#spectral centroid -- centre of mass -- weighted mean of the frequencies present in the sound
import sklearn
spectral_centroids = librosa.feature.spectral_centroid(x[:80000], sr=sr)[0]
# Computing the time variable for visualization
frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames, sr=8000)
# Normalising the spectral centroid for visualisation
def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)
#Plotting the Spectral Centroid along the waveform
librosa.display.waveplot(x[:80000], sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_centroids), color='r')

spectral_rolloff = librosa.feature.spectral_rolloff(x, sr=sr)[0]
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_rolloff), color='r')

mfccs = librosa.feature.mfcc(x, sr=sr)
print(mfccs.shape)
#Displaying the MFCCs:
librosa.display.specshow(mfccs, sr=sr, x_axis='time')

import librosa
import numpy as np
import os
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
data_set = []
label_set = []
label2id = {genre:i for i,genre in enumerate(genres)}
id2label = {i:genre for i,genre in enumerate(genres)}
print(label2id)
for g in genres:
    print(g)
    for filename in os.listdir(f'./genres/{g}/'):
        songname = f'./genres/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=30)
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        rmse = librosa.feature.rms(y=y)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        to_append = f'{np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
        for e in mfcc:
            to_append += f'{np.mean(e)}'
        data_set.append([float(i) for i in to_append.split(" ")])
        label_set.append(label2id[g])

from sklearn.preprocessing import StandardScaler
from keras.utils import np_utils
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data_set, dtype = float))
y = np_utils.to_categorical(np.array(label_set))

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

from keras import models
from keras.layers import Dense, Dropout
def create_model():
    model = models.Sequential()
    model.add(Dense(256, activation='relu', input_shape=(X_train.shape[1],)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax'))
    return model
model = create_model()

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

test_loss, test_acc = model.evaluate(X_test,y_test)
print('test_acc:',test_acc)

来源: http://www.tuicool.com/articles/YVj2yqi

与本文相关文章

暂无,快来抢沙发吧！