grep is a greenfield PHP extension project implemented in C and built as a shared object with phpize.
This repository keeps the upstream GNU grep source in vendor/grep and builds a native PHP module around a separate extension entrypoint. It is not a PHP userspace wrapper around the grep CLI.
Vendored upstream commit:
071ac3aa76a575dd55dc184570da2192adafe267
GNU grep is GPLv3-or-later. If this extension links against, embeds, or adapts GNU grep internals, the resulting combined work has GPL implications for distribution. That constraint is intentional and should stay explicit in project documentation and release artifacts.
The repository-level license notice is in LICENSE, and the full GNU GPLv3 text is vendored in vendor/grep/COPYING.
The tree now contains a real PHP extension skeleton:
config.m4php_grep.hphp_grep.ctests/*.phptvendor/grep
The current vertical slice is intentionally small:
- module loads as
grep.so - exposes
grep_version(): array - exposes
GNUGrep\Engine - exposes
GNUGrep\Pattern - supports fixed-string matching via GNU grep's upstream
Fcompile/Fexecute - supports basic and extended regular expressions via GNU grep's upstream
GEAcompile/EGexecute - supports common PHP-style regex shorthands like
\d,\D,\s,\S,\w,\W,\h, and\Hon the GNU basic/extended regex path - supports a substantial
GNUGrep\Engine::run()slice for prominentgrep --helpswitches
Implemented run() option slices:
- pattern modes:
-G,-E,-F - matcher controls:
-i,-v,-w,-x - recursive search:
-r,-R,-d skip|recurse - binary handling:
-I,-a,-U,--binary-files=without-match|text|binary - result shaping:
-n,-c,-l,-L,-m - output controls:
-b,-H,-h,-o,-Z - pattern sources:
-e,-f - file selection:
--include,--exclude,--exclude-dir,--exclude-from - context controls:
-A,-B,-C,-NUM,--group-separator,--no-group-separator - stdin and record modes:
--label,-z
PCRE mode, richer text-rendering flags like -T and --line-buffered, and colorized CLI formatting are still follow-up work. The extension is being built out engine-first, with matcher parity and benchmark harnesses added slice by slice.
tools/build_upstream_grep.sh
phpize
./configure --enable-grep
makeThe built module will be written to modules/grep.so.
The PHPT suite uses --EXTENSIONS-- grep, so the tests execute against the freshly built module.
Build standalone upstream GNU grep from the vendored source:
tools/build_upstream_grep.shThen run a side-by-side correctness and timing check against the extension:
tools/compare_with_upstream.shThat harness:
- generates a deterministic benchmark fixture tree
- runs standalone GNU grep with
-RnI 'abstract class' - runs the extension on the same fixture
- diffs the normalized outputs
- records repeated wall-clock timings for both paths
That benchmark now exercises the global ggrep() helper for the extension side, so it measures the actual short userspace entrypoint instead of an older internal helper path.
If you want to benchmark in-memory grep work directly, use:
php -n -d extension=modules/grep.so tools/benchmark_ggrep_pipe.php \
'-iE fatal|panic|timeout' \
/path/to/captured-output.log \
100 \
'captured-output'That is useful for shell_exec() / pipeline-style usage where startup and filesystem traversal are not the main cost.
php -d extension=/absolute/path/to/modules/grep.so -r 'var_dump(grep_version());'For a full PHP-visible reference, see docs/USERSPACE_API_REFERENCE.md.
<?php
$pattern = GNUGrep\Pattern::fixedString('TODO');
var_dump($pattern->matches("TODO: wire GNU grep internals\n"));
var_dump(GNUGrep\Engine::versionInfo());
var_dump(GNUGrep\Engine::match(
'abstract class (Alpha|Beta)Base',
"abstract class BetaBase\n",
GNUGrep\Pattern::MODE_EXTENDED_REGEXP
));
var_dump(GNUGrep\Engine::run(['-RnI', 'abstract class', __DIR__ . '/src']));
var_dump(GNUGrep\Engine::run(['-Rniw', 'model', __DIR__ . '/src']));
var_dump(GNUGrep\Engine::run(['-Rn', '-e', 'alpha', '-e', 'beta', __DIR__ . '/src']));ggrep() is now the shortest userspace entrypoint. Pass GNU grep-style args as a string or array, then give it either paths or in-memory text:
<?php
$literalMatches = ggrep(
'-F lamb',
'Mary had a little lamb'
);
$errorMatches = ggrep(
'-iE fatal|panic|timeout',
shell_exec('php artisan about 2>&1') ?? '',
'artisan about'
);
$httpBlob = <<<HEADERS
GET /checkout HTTP/1.1
Host: payments.internal.example
Authorization: Bearer redacted-token
X-Forwarded-For: 203.0.113.9
X-Request-Id: req-7f3a
HTTP/1.1 503 Service Unavailable
Set-Cookie: session=secret; HttpOnly; Secure
Content-Security-Policy: default-src 'self'
Strict-Transport-Security: max-age=31536000
CF-Ray: 89abc123-sea
HEADERS;
$headerMatches = ggrep(
'-niE ^(Host|Authorization|X-Forwarded-For|X-Request-Id|Set-Cookie|Content-Security-Policy|Strict-Transport-Security|CF-Ray):',
$httpBlob,
'checkout trace'
);
$tokenMatches = ggrep(
['-nE', '-e', '\d+', '-e', 'alpha\w+', '-e', 'space\shere'],
"id=42\nslug=alpha_beta\nspace here\n",
'token demo'
);
// Folder search, equivalent to a practical grep -RnI style search.
$classMatches = ggrep(
'-RnI abstract class',
__DIR__ . '/src'
);
// PHP 8.5 pipe operator works cleanly with a tiny wrapper.
$findLamb = fn(string $input): array => ggrep('-F lamb', $input);
$pipedMatches = 'Mary had a little lamb' |> $findLamb(...);On the GNU basic and extended regex modes, the extension also accepts common PHP-style shorthand tokens such as \d, \D, \s, \S, \w, \W, \h, and \H.
For the common "grep folders like grep -RnI" case, the class helpers still exist:
<?php
use GNUGrep\Engine;
use GNUGrep\Pattern;
$matches = Engine::grep('abstract class', __DIR__ . '/src');
$matches = Engine::grep('BetaLeaf', [
__DIR__ . '/src',
__DIR__ . '/tests',
], Pattern::MODE_FIXED_STRING);
$matches = Engine::grepFixed('TODO', [
__DIR__ . '/src',
__DIR__ . '/docs',
]);These helpers assume:
- recursive traversal
- line-numbered results
- binary files treated as
without-match -R-style recursive directory handling
Use ggrep() when you want the shortest userspace form. Use GNUGrep\Engine::run(array $argv) when you want exact CLI-style argv control. Use the class helpers when you want explicit path-only or buffer-only intent.
If you want a quick inventory of PHP type declarations across one or more PSR-4 autoload roots, point GNUGrep\Engine::run() at those folders and search for the declaration forms you care about:
<?php
use GNUGrep\Engine;
$autoloadRoots = [
__DIR__ . '/src',
__DIR__ . '/modules/Billing/src',
];
$matches = Engine::run([
'-RnI',
'-E',
'-e', '^(abstract|final|readonly)[[:space:]]+class[[:space:]]+',
'-e', '^class[[:space:]]+',
'-e', '^interface[[:space:]]+',
'-e', '^trait[[:space:]]+',
'-e', '^enum[[:space:]]+',
'--include=*.php',
...$autoloadRoots,
]);
$lines = $matches
|> (static fn(array $rows): array => array_map(
static fn(array $match): string => sprintf(
'%s:%d %s',
$match['path'],
$match['line'],
$match['text']
),
$rows
));
echo implode(PHP_EOL, $lines), PHP_EOL;That gives you a grep-style scan of the PSR-4 code roots while ignoring binary files and non-PHP assets. It is a good fit for codebase audits like "show me every abstract class, interface, trait, enum, or concrete class we autoload from these roots."
- Add a native compiled-program abstraction that owns GNU grep matcher state instead of rebuilding per call.
- Close the remaining regex bridge gap for anchored
^...$semantics in the generic non--xpath. - Add the remaining major CLI slices such as
-P, plus any text-rendering-only flags that need a dedicated formatted-output API instead of structured arrays. - Keep adding parity PHPTs and side-by-side benchmarks before expanding the CLI surface further.